2023-11-27 11:49:48,606 INFO [train_asr.py:1303] (1/4) Training started 2023-11-27 11:49:48,606 INFO [train_asr.py:1313] (1/4) Device: cuda:1 2023-11-27 11:49:48,608 INFO [train_asr.py:1325] (1/4) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '2b2ac14b326d61d79d04e53fbd69b1ff6d630411', 'k2-git-date': 'Thu Aug 24 05:58:26 2023', 'lhotse-version': '1.16.0', 'torch-version': '2.0.1+cu117', 'torch-cuda-available': True, 'torch-cuda-version': '11.7', 'python-version': '3.1', 'icefall-git-branch': 'multi_KD', 'icefall-git-sha1': 'a9ea720f-dirty', 'icefall-git-date': 'Wed Nov 22 17:48:49 2023', 'icefall-path': '/star-xy/softwares/icefall_development/icefall_multi_KD', 'k2-path': '/star-xy/softwares/k2_development/k2/k2/python/k2/__init__.py', 'lhotse-path': '/star-xy/softwares/anaconda3/envs/multi_KD/lib/python3.10/site-packages/lhotse/__init__.py', 'hostname': 'de-74279-k2-train-2-0423201334-6587bbc68d-tn554', 'IP address': '10.177.74.211'}, 'world_size': 4, 'master_port': 13490, 'tensorboard': True, 'num_epochs': 60, 'start_epoch': 39, 'start_batch': 0, 'exp_dir': PosixPath('multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0'), 'bpe_model': 'data/lang_bpe_500/bpe.model', 'base_lr': 0.045, 'lr_batches': 7500, 'lr_epochs': 3.5, 'ref_duration': 600, 'context_size': 2, 'prune_range': 5, 'lm_scale': 0.25, 'am_scale': 0.0, 'simple_loss_scale': 0.5, 'ctc_loss_scale': 0.2, 'audio_tagging_loss_scale': 1.0, 'seed': 42, 'print_diagnostics': False, 'inf_check': False, 'save_every_n': 4000, 'keep_last_k': 30, 'average_period': 200, 'use_fp16': True, 'stop_early': False, 'do_finetune': False, 'init_modules': None, 'freeze_modules': None, 'finetune_ckpt': None, 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'decoder_dim': 512, 'joiner_dim': 512, 'causal': False, 'chunk_size': '16,32,64,-1', 'left_context_frames': '64,128,256,-1', 'use_transducer': True, 'use_ctc': False, 'do_audio_tagging': True, 'use_encoder_projection': False, 'encoder_projection_dim': -1, 'freeze_encoder': False, 'freezing_encoder_layer_index': '-1', 'freeze_encoder_steps': -1, 'encoder_lr_scale': 1.0, 'beats_label': False, 'full_libri': True, 'mini_libri': False, 'use_vox2': False, 'use_libriheavy': False, 'libriheavy_subset': 'small', 'use_audioset': True, 'audioset_subset': 'unbalanced', 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 1000, 'bucketing_sampler': False, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 1, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'enable_audioset': False, 'use_musan_separately': False, 'input_strategy': 'PrecomputedFeatures', 'drop_features': False, 'return_audio': False, 'use_beats': True, 'use_ecapa': True, 'use_whisper': True, 'whisper_mvq': False, 'beats_ckpt': 'data/models/BEATs/BEATs_iter3_plus_AS2M_finetuned_on_AS2M_cpt2.pt', 'whisper_version': 'small.en', 'blank_id': 0, 'vocab_size': 500} 2023-11-27 11:49:48,608 INFO [train_asr.py:1334] (1/4) About to create model 2023-11-27 11:49:49,317 INFO [train_asr.py:1338] (1/4) Number of model parameters: 65819362 2023-11-27 11:49:49,317 INFO [train_asr.py:1362] (1/4) Using CED labels! 2023-11-27 11:49:49,317 INFO [checkpoint.py:112] (1/4) Loading checkpoint from multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/epoch-38.pt 2023-11-27 11:49:52,677 INFO [train_asr.py:1370] (1/4) Setting the lr scale of parameters in encoder and encoder_embed to 1.0 2023-11-27 11:49:55,369 INFO [train_asr.py:1379] (1/4) Using DDP 2023-11-27 11:49:55,805 INFO [train_asr.py:1402] (1/4) Loading optimizer state dict 2023-11-27 11:49:56,431 INFO [train_asr.py:1410] (1/4) Loading scheduler state dict 2023-11-27 11:49:56,449 INFO [train_asr.py:1432] (1/4) Getting audioset cuts 2023-11-27 11:49:56,450 INFO [kd_datamodule.py:784] (1/4) About to get the audioset cuts. 2023-11-27 11:49:56,451 INFO [train_asr.py:1438] (1/4) Using mux to combine Librispeech with audioset 2023-11-27 11:49:56,452 INFO [train_asr.py:1449] (1/4) CutSet(len=2748469) [underlying data type: ] 2023-11-27 11:50:05,336 INFO [kd_datamodule.py:396] (1/4) Enable MUSAN 2023-11-27 11:50:05,336 INFO [kd_datamodule.py:397] (1/4) About to get Musan cuts 2023-11-27 11:50:08,223 INFO [kd_datamodule.py:427] (1/4) Enable SpecAugment 2023-11-27 11:50:08,223 INFO [kd_datamodule.py:428] (1/4) Time warp factor: 80 2023-11-27 11:50:08,223 INFO [kd_datamodule.py:438] (1/4) Num frame mask: 10 2023-11-27 11:50:08,224 INFO [kd_datamodule.py:451] (1/4) About to create train dataset 2023-11-27 11:50:08,224 INFO [kd_datamodule.py:487] (1/4) Using SimpleCutSampler 2023-11-27 11:50:08,225 INFO [kd_datamodule.py:495] (1/4) About to create train dataloader 2023-11-27 11:50:08,227 INFO [kd_datamodule.py:802] (1/4) About to get the audioset eval cuts. 2023-11-27 11:50:08,228 INFO [train_asr.py:1513] (1/4) CutSet(len=20681) [underlying data type: ] 2023-11-27 11:50:08,284 INFO [kd_datamodule.py:529] (1/4) About to create dev dataset 2023-11-27 11:50:08,731 INFO [kd_datamodule.py:550] (1/4) About to create dev dataloader 2023-11-27 11:50:08,731 INFO [train_asr.py:1527] (1/4) Loading grad scaler state dict 2023-11-27 11:50:28,612 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 0, loss[loss=0.08627, simple_loss=0.1088, pruned_loss=0.01357, audio_tagging_loss=0.01828, over 15635.00 frames. ], tot_loss[loss=0.08627, simple_loss=0.1088, pruned_loss=0.01357, audio_tagging_loss=0.01828, over 15635.00 frames. ], batch size: 57, lr: 1.75e-03, grad_scale: 32.0 2023-11-27 11:50:28,612 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-27 11:50:47,787 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.8250, 4.9664, 5.0957, 4.8648], device='cuda:1') 2023-11-27 11:50:50,255 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.3038, 4.3178, 4.5084, 4.4469], device='cuda:1') 2023-11-27 11:51:02,902 INFO [train_asr.py:1267] (1/4) Epoch 39, validation: loss=0.0578, simple_loss=0.05083, pruned_loss=0.005245, audio_tagging_loss=0.02714, over 4681554.00 frames. 2023-11-27 11:51:02,903 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-27 11:51:07,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3046020.0, ans=0.125 2023-11-27 11:51:12,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=3046020.0, ans=15.0 2023-11-27 11:51:18,831 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3046086.6666666665, ans=0.0 2023-11-27 11:51:24,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3046086.6666666665, ans=0.2 2023-11-27 11:51:44,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3046220.0, ans=0.1 2023-11-27 11:51:55,805 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 456950 2023-11-27 11:52:01,317 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 50, loss[loss=0.05848, simple_loss=0.07053, pruned_loss=0.009098, audio_tagging_loss=0.01411, over 15517.00 frames. ], tot_loss[loss=0.07269, simple_loss=0.08577, pruned_loss=0.01233, audio_tagging_loss=0.01748, over 682278.77 frames. ], batch size: 58, lr: 1.75e-03, grad_scale: 32.0 2023-11-27 11:52:14,882 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3046420.0, ans=0.125 2023-11-27 11:52:25,575 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.249e+01 9.464e+01 1.034e+02 1.107e+02 1.312e+02, threshold=2.068e+02, percent-clipped=0.0 2023-11-27 11:52:25,866 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 11:52:30,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3046486.6666666665, ans=0.0 2023-11-27 11:52:54,133 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 457000 2023-11-27 11:53:00,048 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 100, loss[loss=0.06201, simple_loss=0.06477, pruned_loss=0.01118, audio_tagging_loss=0.01844, over 15076.00 frames. ], tot_loss[loss=0.07326, simple_loss=0.08847, pruned_loss=0.01256, audio_tagging_loss=0.01647, over 1208240.38 frames. ], batch size: 56, lr: 1.75e-03, grad_scale: 32.0 2023-11-27 11:53:02,518 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3046686.6666666665, ans=0.125 2023-11-27 11:53:31,083 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3046820.0, ans=0.95 2023-11-27 11:53:33,894 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3046886.6666666665, ans=0.0 2023-11-27 11:53:51,136 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 457050 2023-11-27 11:53:56,554 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 150, loss[loss=0.09556, simple_loss=0.1192, pruned_loss=0.02616, audio_tagging_loss=0.009803, over 15227.00 frames. ], tot_loss[loss=0.07253, simple_loss=0.08982, pruned_loss=0.01272, audio_tagging_loss=0.01489, over 1618943.79 frames. ], batch size: 55, lr: 1.75e-03, grad_scale: 32.0 2023-11-27 11:54:02,346 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3047020.0, ans=0.125 2023-11-27 11:54:06,901 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3047086.6666666665, ans=0.125 2023-11-27 11:54:13,399 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 11:54:19,296 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.518e+01 8.988e+01 9.589e+01 1.001e+02 1.163e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-27 11:54:36,760 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 11:54:47,786 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 457100 2023-11-27 11:54:53,338 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 200, loss[loss=0.07275, simple_loss=0.1031, pruned_loss=0.01305, audio_tagging_loss=0.008154, over 13762.00 frames. ], tot_loss[loss=0.07246, simple_loss=0.09279, pruned_loss=0.01304, audio_tagging_loss=0.01302, over 1937467.52 frames. ], batch size: 52, lr: 1.75e-03, grad_scale: 32.0 2023-11-27 11:55:02,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3047353.3333333335, ans=0.125 2023-11-27 11:55:14,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3047420.0, ans=0.125 2023-11-27 11:55:15,543 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.91 vs. limit=22.5 2023-11-27 11:55:33,581 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3047553.3333333335, ans=0.1 2023-11-27 11:55:39,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3047620.0, ans=0.0 2023-11-27 11:55:41,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=3047620.0, ans=0.025 2023-11-27 11:55:45,185 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 457150 2023-11-27 11:55:51,213 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 250, loss[loss=0.07654, simple_loss=0.1061, pruned_loss=0.01462, audio_tagging_loss=0.008862, over 15077.00 frames. ], tot_loss[loss=0.07156, simple_loss=0.09321, pruned_loss=0.01314, audio_tagging_loss=0.01182, over 2183155.01 frames. ], batch size: 54, lr: 1.75e-03, grad_scale: 32.0 2023-11-27 11:55:52,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3047686.6666666665, ans=0.015 2023-11-27 11:55:59,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3047686.6666666665, ans=0.5 2023-11-27 11:56:14,334 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.705e+01 8.936e+01 9.538e+01 1.043e+02 1.286e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-27 11:56:18,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3047820.0, ans=0.0 2023-11-27 11:56:23,845 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.20 vs. limit=6.0 2023-11-27 11:56:42,269 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 457200 2023-11-27 11:56:48,612 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 300, loss[loss=0.07294, simple_loss=0.1001, pruned_loss=0.01496, audio_tagging_loss=0.007916, over 15846.00 frames. ], tot_loss[loss=0.07045, simple_loss=0.09269, pruned_loss=0.01316, audio_tagging_loss=0.01095, over 2375918.68 frames. ], batch size: 58, lr: 1.75e-03, grad_scale: 16.0 2023-11-27 11:56:58,610 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3048086.6666666665, ans=0.125 2023-11-27 11:56:58,868 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.89 vs. limit=12.0 2023-11-27 11:57:00,760 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3048086.6666666665, ans=0.0 2023-11-27 11:57:07,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3048086.6666666665, ans=0.1 2023-11-27 11:57:20,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3048153.3333333335, ans=0.125 2023-11-27 11:57:20,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3048153.3333333335, ans=0.1 2023-11-27 11:57:22,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3048220.0, ans=0.0 2023-11-27 11:57:28,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3048220.0, ans=0.125 2023-11-27 11:57:31,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3048220.0, ans=0.125 2023-11-27 11:57:32,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=3048286.6666666665, ans=0.1 2023-11-27 11:57:35,479 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.18 vs. limit=12.0 2023-11-27 11:57:39,490 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 457250 2023-11-27 11:57:44,919 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 350, loss[loss=0.07272, simple_loss=0.1046, pruned_loss=0.01393, audio_tagging_loss=0.006496, over 15340.00 frames. ], tot_loss[loss=0.07028, simple_loss=0.09354, pruned_loss=0.0133, audio_tagging_loss=0.01021, over 2523160.65 frames. ], batch size: 58, lr: 1.75e-03, grad_scale: 8.0 2023-11-27 11:57:46,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3048353.3333333335, ans=0.2 2023-11-27 11:57:46,349 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3048353.3333333335, ans=0.07 2023-11-27 11:58:05,529 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.67 vs. limit=22.5 2023-11-27 11:58:07,945 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3048486.6666666665, ans=0.125 2023-11-27 11:58:10,951 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.248e+01 8.638e+01 9.224e+01 9.880e+01 1.297e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-27 11:58:25,789 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.48 vs. limit=15.0 2023-11-27 11:58:36,129 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 457300 2023-11-27 11:58:42,181 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 400, loss[loss=0.05187, simple_loss=0.0698, pruned_loss=0.008451, audio_tagging_loss=0.008516, over 15973.00 frames. ], tot_loss[loss=0.06891, simple_loss=0.09216, pruned_loss=0.01293, audio_tagging_loss=0.009892, over 2641171.11 frames. ], batch size: 62, lr: 1.75e-03, grad_scale: 16.0 2023-11-27 11:58:47,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3048686.6666666665, ans=0.07 2023-11-27 11:59:00,052 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.07 vs. limit=6.0 2023-11-27 11:59:05,097 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3048820.0, ans=0.125 2023-11-27 11:59:14,945 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3048886.6666666665, ans=0.2 2023-11-27 11:59:23,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3048886.6666666665, ans=0.125 2023-11-27 11:59:31,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3048953.3333333335, ans=0.0 2023-11-27 11:59:32,747 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 457350 2023-11-27 11:59:32,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3048953.3333333335, ans=0.035 2023-11-27 11:59:38,824 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 450, loss[loss=0.07529, simple_loss=0.09725, pruned_loss=0.0166, audio_tagging_loss=0.01006, over 14756.00 frames. ], tot_loss[loss=0.06831, simple_loss=0.09182, pruned_loss=0.01277, audio_tagging_loss=0.009635, over 2729127.75 frames. ], batch size: 55, lr: 1.75e-03, grad_scale: 16.0 2023-11-27 11:59:54,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3049086.6666666665, ans=0.125 2023-11-27 11:59:55,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3049086.6666666665, ans=0.0 2023-11-27 12:00:02,789 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.975e+01 8.448e+01 9.046e+01 9.688e+01 1.234e+02, threshold=1.809e+02, percent-clipped=0.0 2023-11-27 12:00:08,484 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.53 vs. limit=10.0 2023-11-27 12:00:09,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3049153.3333333335, ans=0.2 2023-11-27 12:00:12,289 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3049220.0, ans=0.125 2023-11-27 12:00:29,403 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 457400 2023-11-27 12:00:35,324 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 500, loss[loss=0.06872, simple_loss=0.09784, pruned_loss=0.01246, audio_tagging_loss=0.007338, over 16455.00 frames. ], tot_loss[loss=0.0679, simple_loss=0.09182, pruned_loss=0.0127, audio_tagging_loss=0.009297, over 2804365.24 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:00:37,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3049353.3333333335, ans=0.125 2023-11-27 12:00:46,500 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3049420.0, ans=0.125 2023-11-27 12:00:46,969 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.70 vs. limit=15.0 2023-11-27 12:01:04,709 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.26 vs. limit=12.0 2023-11-27 12:01:06,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3049486.6666666665, ans=0.1 2023-11-27 12:01:08,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3049553.3333333335, ans=0.125 2023-11-27 12:01:09,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3049553.3333333335, ans=0.0 2023-11-27 12:01:18,459 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3049553.3333333335, ans=0.1 2023-11-27 12:01:25,889 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 457450 2023-11-27 12:01:27,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3049620.0, ans=0.0 2023-11-27 12:01:31,321 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.73 vs. limit=15.0 2023-11-27 12:01:32,371 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 550, loss[loss=0.05571, simple_loss=0.07939, pruned_loss=0.008383, audio_tagging_loss=0.007634, over 14690.00 frames. ], tot_loss[loss=0.06696, simple_loss=0.09021, pruned_loss=0.01257, audio_tagging_loss=0.00928, over 2856755.40 frames. ], batch size: 54, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:01:56,933 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.005e+01 8.636e+01 9.349e+01 1.005e+02 1.321e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-27 12:01:59,379 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3049820.0, ans=0.125 2023-11-27 12:02:23,331 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 457500 2023-11-27 12:02:26,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3049953.3333333335, ans=0.1 2023-11-27 12:02:28,718 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 600, loss[loss=0.0481, simple_loss=0.06021, pruned_loss=0.007648, audio_tagging_loss=0.01035, over 16490.00 frames. ], tot_loss[loss=0.0674, simple_loss=0.0912, pruned_loss=0.01273, audio_tagging_loss=0.009069, over 2905622.83 frames. ], batch size: 63, lr: 1.74e-03, grad_scale: 8.0 2023-11-27 12:02:34,084 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3050020.0, ans=0.2 2023-11-27 12:03:01,740 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.16 vs. limit=15.0 2023-11-27 12:03:07,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3050220.0, ans=0.015 2023-11-27 12:03:20,414 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 457550 2023-11-27 12:03:25,687 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 650, loss[loss=0.05446, simple_loss=0.07102, pruned_loss=0.01141, audio_tagging_loss=0.00754, over 14238.00 frames. ], tot_loss[loss=0.06757, simple_loss=0.0914, pruned_loss=0.01289, audio_tagging_loss=0.008979, over 2941608.53 frames. ], batch size: 54, lr: 1.74e-03, grad_scale: 8.0 2023-11-27 12:03:29,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3050353.3333333335, ans=0.125 2023-11-27 12:03:34,788 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3050353.3333333335, ans=0.0 2023-11-27 12:03:49,344 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3050486.6666666665, ans=0.04949747468305833 2023-11-27 12:03:52,312 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.590e+01 8.673e+01 9.304e+01 1.013e+02 1.216e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-27 12:04:09,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3050553.3333333335, ans=0.125 2023-11-27 12:04:14,338 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.03 vs. limit=15.0 2023-11-27 12:04:17,024 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 457600 2023-11-27 12:04:22,938 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 700, loss[loss=0.08605, simple_loss=0.1228, pruned_loss=0.01941, audio_tagging_loss=0.005256, over 14776.00 frames. ], tot_loss[loss=0.06697, simple_loss=0.09056, pruned_loss=0.01271, audio_tagging_loss=0.008986, over 2964989.71 frames. ], batch size: 54, lr: 1.74e-03, grad_scale: 8.0 2023-11-27 12:04:31,351 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.85 vs. limit=6.0 2023-11-27 12:04:51,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3050820.0, ans=0.125 2023-11-27 12:04:58,212 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.49 vs. limit=12.0 2023-11-27 12:05:15,165 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 457650 2023-11-27 12:05:20,550 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 750, loss[loss=0.06413, simple_loss=0.09197, pruned_loss=0.008187, audio_tagging_loss=0.009956, over 17139.00 frames. ], tot_loss[loss=0.06743, simple_loss=0.09118, pruned_loss=0.01286, audio_tagging_loss=0.008982, over 2990184.23 frames. ], batch size: 62, lr: 1.74e-03, grad_scale: 8.0 2023-11-27 12:05:38,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3051086.6666666665, ans=0.1 2023-11-27 12:05:45,851 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.03 vs. limit=15.0 2023-11-27 12:05:46,326 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.785e+01 8.728e+01 9.485e+01 1.039e+02 1.301e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-27 12:06:11,701 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 457700 2023-11-27 12:06:11,945 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3051286.6666666665, ans=0.0 2023-11-27 12:06:18,043 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 800, loss[loss=0.06828, simple_loss=0.09783, pruned_loss=0.01164, audio_tagging_loss=0.007727, over 16163.00 frames. ], tot_loss[loss=0.06696, simple_loss=0.09019, pruned_loss=0.01276, audio_tagging_loss=0.009111, over 2999033.89 frames. ], batch size: 60, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:06:19,383 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3051353.3333333335, ans=0.125 2023-11-27 12:06:23,422 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3051353.3333333335, ans=0.015 2023-11-27 12:06:25,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3051353.3333333335, ans=0.125 2023-11-27 12:06:39,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3051420.0, ans=0.125 2023-11-27 12:06:50,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3051486.6666666665, ans=0.0 2023-11-27 12:06:56,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3051553.3333333335, ans=0.2 2023-11-27 12:07:02,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3051553.3333333335, ans=0.125 2023-11-27 12:07:03,433 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3051620.0, ans=0.0 2023-11-27 12:07:09,804 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 457750 2023-11-27 12:07:15,113 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 850, loss[loss=0.04084, simple_loss=0.05573, pruned_loss=0.004772, audio_tagging_loss=0.008205, over 14998.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.08948, pruned_loss=0.01251, audio_tagging_loss=0.009177, over 3005260.08 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:07:41,538 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.800e+01 8.486e+01 9.072e+01 9.874e+01 1.616e+02, threshold=1.814e+02, percent-clipped=0.0 2023-11-27 12:07:57,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3051886.6666666665, ans=0.1 2023-11-27 12:08:08,067 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 457800 2023-11-27 12:08:13,505 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.56 vs. limit=15.0 2023-11-27 12:08:14,047 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 900, loss[loss=0.07563, simple_loss=0.1034, pruned_loss=0.01461, audio_tagging_loss=0.00933, over 15728.00 frames. ], tot_loss[loss=0.06684, simple_loss=0.09013, pruned_loss=0.01258, audio_tagging_loss=0.009197, over 3013398.59 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:08:17,664 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3052020.0, ans=0.125 2023-11-27 12:08:34,025 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.41 vs. limit=12.0 2023-11-27 12:08:36,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=3052153.3333333335, ans=10.0 2023-11-27 12:08:39,600 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.04 vs. limit=22.5 2023-11-27 12:08:40,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3052153.3333333335, ans=0.0 2023-11-27 12:08:41,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3052153.3333333335, ans=0.125 2023-11-27 12:08:49,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3052220.0, ans=0.1 2023-11-27 12:08:56,145 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3052220.0, ans=0.125 2023-11-27 12:08:57,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3052220.0, ans=0.125 2023-11-27 12:09:02,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=3052286.6666666665, ans=0.95 2023-11-27 12:09:05,809 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 457850 2023-11-27 12:09:11,248 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 950, loss[loss=0.07059, simple_loss=0.09891, pruned_loss=0.01351, audio_tagging_loss=0.007624, over 14388.00 frames. ], tot_loss[loss=0.06702, simple_loss=0.09063, pruned_loss=0.01258, audio_tagging_loss=0.009125, over 3032590.93 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:09:18,789 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3052353.3333333335, ans=0.125 2023-11-27 12:09:38,366 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.523e+01 8.734e+01 9.231e+01 1.017e+02 1.316e+02, threshold=1.846e+02, percent-clipped=0.0 2023-11-27 12:09:44,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3052486.6666666665, ans=0.0 2023-11-27 12:09:48,226 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3052553.3333333335, ans=0.2 2023-11-27 12:10:00,495 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3052620.0, ans=0.125 2023-11-27 12:10:03,008 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 457900 2023-11-27 12:10:03,564 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.47 vs. limit=10.0 2023-11-27 12:10:08,364 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 1000, loss[loss=0.06971, simple_loss=0.09482, pruned_loss=0.01233, audio_tagging_loss=0.009964, over 15218.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.09034, pruned_loss=0.01248, audio_tagging_loss=0.008861, over 3034072.49 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:10:10,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3052686.6666666665, ans=0.025 2023-11-27 12:10:34,306 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 12:10:35,941 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.06 vs. limit=15.0 2023-11-27 12:10:50,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3052886.6666666665, ans=0.1 2023-11-27 12:11:00,129 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 457950 2023-11-27 12:11:04,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3053020.0, ans=0.125 2023-11-27 12:11:05,654 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 1050, loss[loss=0.06907, simple_loss=0.1008, pruned_loss=0.01105, audio_tagging_loss=0.007631, over 15262.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.08968, pruned_loss=0.01244, audio_tagging_loss=0.008838, over 3037726.34 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 8.0 2023-11-27 12:11:13,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3053020.0, ans=0.2 2023-11-27 12:11:14,554 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3053020.0, ans=0.1 2023-11-27 12:11:20,717 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.09 vs. limit=6.0 2023-11-27 12:11:31,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3053153.3333333335, ans=0.0 2023-11-27 12:11:32,620 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.119e+01 8.482e+01 9.051e+01 9.992e+01 1.169e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-27 12:11:32,910 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-27 12:11:43,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3053220.0, ans=0.125 2023-11-27 12:11:47,018 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.00 vs. limit=15.0 2023-11-27 12:11:47,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3053220.0, ans=0.2 2023-11-27 12:11:50,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=3053286.6666666665, ans=15.0 2023-11-27 12:11:52,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3053286.6666666665, ans=0.125 2023-11-27 12:11:56,568 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 458000 2023-11-27 12:11:59,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3053286.6666666665, ans=0.2 2023-11-27 12:12:02,055 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.77 vs. limit=15.0 2023-11-27 12:12:02,646 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 1100, loss[loss=0.05795, simple_loss=0.08905, pruned_loss=0.007107, audio_tagging_loss=0.006319, over 15120.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.0896, pruned_loss=0.01255, audio_tagging_loss=0.008802, over 3040049.83 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 8.0 2023-11-27 12:12:07,046 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 12:12:07,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3053353.3333333335, ans=0.2 2023-11-27 12:12:13,134 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.35 vs. limit=22.5 2023-11-27 12:12:14,360 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.87 vs. limit=22.5 2023-11-27 12:12:21,993 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3053420.0, ans=0.125 2023-11-27 12:12:52,051 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3053620.0, ans=0.125 2023-11-27 12:12:54,112 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 458050 2023-11-27 12:12:57,694 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3053620.0, ans=0.0 2023-11-27 12:12:59,557 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 1150, loss[loss=0.04793, simple_loss=0.06109, pruned_loss=0.008441, audio_tagging_loss=0.008944, over 14627.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.09026, pruned_loss=0.0127, audio_tagging_loss=0.008769, over 3042835.95 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 8.0 2023-11-27 12:13:04,815 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3053686.6666666665, ans=0.125 2023-11-27 12:13:09,864 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3053686.6666666665, ans=0.125 2023-11-27 12:13:27,295 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.335e+01 8.562e+01 9.025e+01 9.989e+01 1.460e+02, threshold=1.805e+02, percent-clipped=0.0 2023-11-27 12:13:28,672 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3053820.0, ans=0.1 2023-11-27 12:13:38,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3053886.6666666665, ans=0.0 2023-11-27 12:13:47,775 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3053953.3333333335, ans=0.125 2023-11-27 12:13:49,244 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.83 vs. limit=15.0 2023-11-27 12:13:50,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3053953.3333333335, ans=0.125 2023-11-27 12:13:50,837 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 458100 2023-11-27 12:13:57,001 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 1200, loss[loss=0.07762, simple_loss=0.1109, pruned_loss=0.01432, audio_tagging_loss=0.00783, over 14682.00 frames. ], tot_loss[loss=0.06691, simple_loss=0.09055, pruned_loss=0.01286, audio_tagging_loss=0.008778, over 3039008.87 frames. ], batch size: 53, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:14:26,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3054153.3333333335, ans=0.2 2023-11-27 12:14:47,934 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 458150 2023-11-27 12:14:52,666 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.67 vs. limit=15.0 2023-11-27 12:14:53,251 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 1250, loss[loss=0.07463, simple_loss=0.1023, pruned_loss=0.01235, audio_tagging_loss=0.01115, over 13868.00 frames. ], tot_loss[loss=0.06725, simple_loss=0.09095, pruned_loss=0.01293, audio_tagging_loss=0.008844, over 3040269.99 frames. ], batch size: 54, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:15:02,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3054353.3333333335, ans=0.125 2023-11-27 12:15:04,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3054420.0, ans=0.125 2023-11-27 12:15:11,409 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.71 vs. limit=22.5 2023-11-27 12:15:16,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3054486.6666666665, ans=0.125 2023-11-27 12:15:16,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3054486.6666666665, ans=0.025 2023-11-27 12:15:19,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3054486.6666666665, ans=0.125 2023-11-27 12:15:20,537 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.76 vs. limit=10.0 2023-11-27 12:15:21,006 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.926e+01 8.636e+01 9.164e+01 9.922e+01 1.354e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-27 12:15:37,677 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 12:15:44,094 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 458200 2023-11-27 12:15:50,099 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 1300, loss[loss=0.05803, simple_loss=0.0736, pruned_loss=0.01086, audio_tagging_loss=0.01037, over 14456.00 frames. ], tot_loss[loss=0.06667, simple_loss=0.09036, pruned_loss=0.01272, audio_tagging_loss=0.008764, over 3040899.54 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:15:50,271 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3054686.6666666665, ans=0.1 2023-11-27 12:15:58,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3054686.6666666665, ans=0.125 2023-11-27 12:16:04,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3054753.3333333335, ans=0.125 2023-11-27 12:16:05,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3054753.3333333335, ans=0.0 2023-11-27 12:16:27,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3054886.6666666665, ans=0.125 2023-11-27 12:16:34,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3054953.3333333335, ans=0.07 2023-11-27 12:16:36,907 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.44 vs. limit=6.0 2023-11-27 12:16:40,693 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 458250 2023-11-27 12:16:46,973 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 1350, loss[loss=0.07169, simple_loss=0.09794, pruned_loss=0.01391, audio_tagging_loss=0.008808, over 16486.00 frames. ], tot_loss[loss=0.06657, simple_loss=0.09037, pruned_loss=0.01268, audio_tagging_loss=0.008705, over 3043243.38 frames. ], batch size: 62, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:16:49,289 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3055020.0, ans=0.125 2023-11-27 12:16:49,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3055020.0, ans=0.125 2023-11-27 12:16:55,582 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.80 vs. limit=15.0 2023-11-27 12:17:13,428 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.067e+01 8.626e+01 9.162e+01 9.999e+01 1.247e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-27 12:17:22,654 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.47 vs. limit=15.0 2023-11-27 12:17:31,797 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 12:17:38,554 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 458300 2023-11-27 12:17:43,926 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 1400, loss[loss=0.05879, simple_loss=0.07941, pruned_loss=0.009526, audio_tagging_loss=0.009559, over 15477.00 frames. ], tot_loss[loss=0.06664, simple_loss=0.0905, pruned_loss=0.01266, audio_tagging_loss=0.008739, over 3045140.05 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:17:56,178 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 12:18:25,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3055553.3333333335, ans=0.125 2023-11-27 12:18:27,746 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3055553.3333333335, ans=0.0 2023-11-27 12:18:35,164 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 458350 2023-11-27 12:18:40,529 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 1450, loss[loss=0.08081, simple_loss=0.114, pruned_loss=0.01525, audio_tagging_loss=0.008556, over 14605.00 frames. ], tot_loss[loss=0.06648, simple_loss=0.09022, pruned_loss=0.01257, audio_tagging_loss=0.008803, over 3042363.62 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:18:41,888 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3055686.6666666665, ans=0.125 2023-11-27 12:18:51,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3055753.3333333335, ans=0.125 2023-11-27 12:19:05,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3055820.0, ans=0.125 2023-11-27 12:19:06,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3055820.0, ans=0.125 2023-11-27 12:19:08,521 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.227e+01 8.606e+01 9.355e+01 1.005e+02 1.686e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-27 12:19:28,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3055953.3333333335, ans=0.125 2023-11-27 12:19:31,290 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 458400 2023-11-27 12:19:37,791 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 1500, loss[loss=0.05216, simple_loss=0.07385, pruned_loss=0.006225, audio_tagging_loss=0.00901, over 15216.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.09024, pruned_loss=0.01258, audio_tagging_loss=0.00886, over 3044554.02 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:20:05,346 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3056153.3333333335, ans=0.09899494936611666 2023-11-27 12:20:10,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3056220.0, ans=0.125 2023-11-27 12:20:11,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3056220.0, ans=0.125 2023-11-27 12:20:29,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3056286.6666666665, ans=0.0 2023-11-27 12:20:30,047 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 458450 2023-11-27 12:20:35,428 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 1550, loss[loss=0.06796, simple_loss=0.08956, pruned_loss=0.01281, audio_tagging_loss=0.01038, over 16173.00 frames. ], tot_loss[loss=0.06747, simple_loss=0.09148, pruned_loss=0.01278, audio_tagging_loss=0.008946, over 3044584.08 frames. ], batch size: 60, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:20:37,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3056353.3333333335, ans=0.07 2023-11-27 12:21:01,714 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.051e+01 8.639e+01 9.120e+01 9.883e+01 1.538e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-27 12:21:06,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3056486.6666666665, ans=0.125 2023-11-27 12:21:07,824 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.79 vs. limit=15.0 2023-11-27 12:21:26,561 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 458500 2023-11-27 12:21:31,985 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 1600, loss[loss=0.06988, simple_loss=0.09454, pruned_loss=0.0143, audio_tagging_loss=0.008312, over 16978.00 frames. ], tot_loss[loss=0.06735, simple_loss=0.09126, pruned_loss=0.01271, audio_tagging_loss=0.009011, over 3046787.52 frames. ], batch size: 62, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:21:32,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3056686.6666666665, ans=0.2 2023-11-27 12:21:35,553 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 12:21:53,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3056753.3333333335, ans=0.1 2023-11-27 12:22:04,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3056820.0, ans=0.0 2023-11-27 12:22:17,161 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3056953.3333333335, ans=0.0 2023-11-27 12:22:23,400 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 458550 2023-11-27 12:22:23,984 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.74 vs. limit=12.0 2023-11-27 12:22:28,726 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 1650, loss[loss=0.07346, simple_loss=0.09519, pruned_loss=0.01353, audio_tagging_loss=0.01233, over 15127.00 frames. ], tot_loss[loss=0.06711, simple_loss=0.09096, pruned_loss=0.01264, audio_tagging_loss=0.008994, over 3052217.51 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:22:34,144 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3057020.0, ans=0.0 2023-11-27 12:22:38,594 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3057020.0, ans=0.04949747468305833 2023-11-27 12:22:43,469 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3057086.6666666665, ans=0.0 2023-11-27 12:22:47,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3057086.6666666665, ans=0.125 2023-11-27 12:22:57,416 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.436e+01 8.745e+01 9.326e+01 9.935e+01 1.288e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-27 12:22:58,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3057153.3333333335, ans=0.125 2023-11-27 12:23:22,291 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 458600 2023-11-27 12:23:29,013 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 1700, loss[loss=0.06393, simple_loss=0.09185, pruned_loss=0.009841, audio_tagging_loss=0.008158, over 14886.00 frames. ], tot_loss[loss=0.06723, simple_loss=0.09095, pruned_loss=0.0127, audio_tagging_loss=0.009053, over 3055619.50 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:23:29,634 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.06 vs. limit=15.0 2023-11-27 12:23:51,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3057486.6666666665, ans=0.0 2023-11-27 12:23:58,354 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3057486.6666666665, ans=0.2 2023-11-27 12:24:00,956 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 12:24:20,649 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 458650 2023-11-27 12:24:25,413 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3057686.6666666665, ans=0.2 2023-11-27 12:24:26,128 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 1750, loss[loss=0.06713, simple_loss=0.1008, pruned_loss=0.01068, audio_tagging_loss=0.006073, over 15107.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.09047, pruned_loss=0.01247, audio_tagging_loss=0.008981, over 3059032.84 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:24:30,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3057686.6666666665, ans=0.125 2023-11-27 12:24:37,382 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3057753.3333333335, ans=0.125 2023-11-27 12:24:53,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3057820.0, ans=0.1 2023-11-27 12:24:54,377 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.758e+01 8.466e+01 9.121e+01 9.743e+01 1.211e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-27 12:25:17,844 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 458700 2023-11-27 12:25:21,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3057953.3333333335, ans=0.125 2023-11-27 12:25:23,183 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 1800, loss[loss=0.07465, simple_loss=0.1066, pruned_loss=0.01441, audio_tagging_loss=0.006947, over 15223.00 frames. ], tot_loss[loss=0.0672, simple_loss=0.091, pruned_loss=0.0128, audio_tagging_loss=0.008899, over 3056005.71 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:25:50,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3058153.3333333335, ans=0.0 2023-11-27 12:25:56,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3058153.3333333335, ans=0.1 2023-11-27 12:26:08,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3058286.6666666665, ans=0.125 2023-11-27 12:26:16,830 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 458750 2023-11-27 12:26:22,254 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 1850, loss[loss=0.08604, simple_loss=0.1204, pruned_loss=0.02011, audio_tagging_loss=0.005722, over 15953.00 frames. ], tot_loss[loss=0.0672, simple_loss=0.0912, pruned_loss=0.01279, audio_tagging_loss=0.008809, over 3054834.08 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:26:39,136 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.67 vs. limit=15.0 2023-11-27 12:26:39,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3058420.0, ans=0.0 2023-11-27 12:26:50,064 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.55 vs. limit=22.5 2023-11-27 12:26:50,668 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.344e+01 8.879e+01 9.441e+01 1.010e+02 1.422e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-27 12:26:50,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3058486.6666666665, ans=0.1 2023-11-27 12:27:05,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3058553.3333333335, ans=0.125 2023-11-27 12:27:06,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3058553.3333333335, ans=0.125 2023-11-27 12:27:08,600 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3058620.0, ans=0.125 2023-11-27 12:27:13,956 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 458800 2023-11-27 12:27:20,579 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 1900, loss[loss=0.08051, simple_loss=0.1099, pruned_loss=0.01598, audio_tagging_loss=0.009559, over 15841.00 frames. ], tot_loss[loss=0.06765, simple_loss=0.09191, pruned_loss=0.01298, audio_tagging_loss=0.008713, over 3052741.04 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:27:24,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3058686.6666666665, ans=0.125 2023-11-27 12:27:54,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3058886.6666666665, ans=0.0 2023-11-27 12:27:54,789 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.54 vs. limit=15.0 2023-11-27 12:28:12,458 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 458850 2023-11-27 12:28:12,933 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.80 vs. limit=15.0 2023-11-27 12:28:17,871 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 1950, loss[loss=0.07026, simple_loss=0.09823, pruned_loss=0.01421, audio_tagging_loss=0.006937, over 15877.00 frames. ], tot_loss[loss=0.06743, simple_loss=0.09161, pruned_loss=0.0129, audio_tagging_loss=0.00873, over 3051895.08 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:28:45,647 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 12:28:46,476 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.470e+01 8.494e+01 8.981e+01 9.864e+01 1.306e+02, threshold=1.796e+02, percent-clipped=0.0 2023-11-27 12:29:00,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3059220.0, ans=0.125 2023-11-27 12:29:11,150 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 458900 2023-11-27 12:29:14,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3059286.6666666665, ans=0.0 2023-11-27 12:29:15,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3059353.3333333335, ans=0.0 2023-11-27 12:29:16,529 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 2000, loss[loss=0.07523, simple_loss=0.1092, pruned_loss=0.01311, audio_tagging_loss=0.007518, over 15284.00 frames. ], tot_loss[loss=0.06752, simple_loss=0.09153, pruned_loss=0.01298, audio_tagging_loss=0.008781, over 3046649.84 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:29:19,024 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 12:29:40,721 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.43 vs. limit=15.0 2023-11-27 12:29:42,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3059486.6666666665, ans=0.0 2023-11-27 12:29:53,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3059553.3333333335, ans=0.125 2023-11-27 12:30:08,140 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 458950 2023-11-27 12:30:13,674 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 2050, loss[loss=0.05956, simple_loss=0.08337, pruned_loss=0.01112, audio_tagging_loss=0.006752, over 14892.00 frames. ], tot_loss[loss=0.06747, simple_loss=0.09155, pruned_loss=0.01294, audio_tagging_loss=0.00875, over 3050095.68 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:30:34,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3059753.3333333335, ans=0.1 2023-11-27 12:30:37,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3059820.0, ans=0.0 2023-11-27 12:30:39,076 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 12:30:43,240 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.343e+01 8.669e+01 9.305e+01 9.867e+01 1.531e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-27 12:30:43,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3059820.0, ans=0.1 2023-11-27 12:30:51,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3059886.6666666665, ans=0.0 2023-11-27 12:31:06,142 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 459000 2023-11-27 12:31:09,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3059953.3333333335, ans=0.125 2023-11-27 12:31:12,195 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 2100, loss[loss=0.07472, simple_loss=0.09355, pruned_loss=0.01896, audio_tagging_loss=0.008988, over 15571.00 frames. ], tot_loss[loss=0.06818, simple_loss=0.09262, pruned_loss=0.01326, audio_tagging_loss=0.00861, over 3058117.32 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:31:18,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=3060020.0, ans=6.0 2023-11-27 12:32:04,008 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 459050 2023-11-27 12:32:06,382 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.23 vs. limit=15.0 2023-11-27 12:32:10,924 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 2150, loss[loss=0.07266, simple_loss=0.1024, pruned_loss=0.01461, audio_tagging_loss=0.006845, over 16121.00 frames. ], tot_loss[loss=0.0678, simple_loss=0.09199, pruned_loss=0.01315, audio_tagging_loss=0.008653, over 3057096.27 frames. ], batch size: 61, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:32:17,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3060353.3333333335, ans=0.0 2023-11-27 12:32:39,025 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.225e+01 8.486e+01 9.121e+01 9.969e+01 1.223e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-27 12:32:45,875 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3060553.3333333335, ans=0.125 2023-11-27 12:32:47,759 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 12:33:02,296 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 459100 2023-11-27 12:33:07,661 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 2200, loss[loss=0.07195, simple_loss=0.09855, pruned_loss=0.01436, audio_tagging_loss=0.008319, over 15240.00 frames. ], tot_loss[loss=0.06792, simple_loss=0.09228, pruned_loss=0.01316, audio_tagging_loss=0.008615, over 3049970.94 frames. ], batch size: 54, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:33:17,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3060753.3333333335, ans=0.1 2023-11-27 12:33:22,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3060753.3333333335, ans=0.125 2023-11-27 12:33:27,427 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.21 vs. limit=15.0 2023-11-27 12:33:42,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3060886.6666666665, ans=0.125 2023-11-27 12:33:49,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3060886.6666666665, ans=0.125 2023-11-27 12:33:55,591 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.53 vs. limit=15.0 2023-11-27 12:33:59,429 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 459150 2023-11-27 12:34:04,161 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3061020.0, ans=0.0 2023-11-27 12:34:05,526 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 2250, loss[loss=0.08549, simple_loss=0.1123, pruned_loss=0.01919, audio_tagging_loss=0.01016, over 15892.00 frames. ], tot_loss[loss=0.06817, simple_loss=0.0927, pruned_loss=0.01322, audio_tagging_loss=0.0086, over 3050605.72 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:34:06,868 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3061020.0, ans=0.1 2023-11-27 12:34:11,742 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.54 vs. limit=15.0 2023-11-27 12:34:35,887 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.97 vs. limit=15.0 2023-11-27 12:34:36,083 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.046e+01 8.598e+01 9.189e+01 9.920e+01 1.225e+02, threshold=1.838e+02, percent-clipped=0.0 2023-11-27 12:34:38,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3061153.3333333335, ans=0.2 2023-11-27 12:34:45,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3061220.0, ans=0.125 2023-11-27 12:34:52,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3061286.6666666665, ans=0.2 2023-11-27 12:34:57,820 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 459200 2023-11-27 12:35:05,897 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 2300, loss[loss=0.06321, simple_loss=0.09631, pruned_loss=0.009378, audio_tagging_loss=0.005679, over 15303.00 frames. ], tot_loss[loss=0.06786, simple_loss=0.09223, pruned_loss=0.01307, audio_tagging_loss=0.008684, over 3051430.13 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:35:08,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3061353.3333333335, ans=0.125 2023-11-27 12:35:20,863 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.76 vs. limit=15.0 2023-11-27 12:35:44,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3061553.3333333335, ans=0.125 2023-11-27 12:35:50,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3061553.3333333335, ans=0.125 2023-11-27 12:35:57,645 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 459250 2023-11-27 12:35:58,362 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.36 vs. limit=15.0 2023-11-27 12:35:59,782 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 12:36:03,042 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 2350, loss[loss=0.06365, simple_loss=0.0813, pruned_loss=0.01439, audio_tagging_loss=0.008612, over 14517.00 frames. ], tot_loss[loss=0.06739, simple_loss=0.09124, pruned_loss=0.01302, audio_tagging_loss=0.00875, over 3044866.38 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:36:21,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3061753.3333333335, ans=0.125 2023-11-27 12:36:34,194 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.011e+01 8.777e+01 9.429e+01 1.022e+02 1.253e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-27 12:36:54,488 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3061953.3333333335, ans=0.125 2023-11-27 12:36:55,310 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 459300 2023-11-27 12:37:00,897 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 2400, loss[loss=0.06891, simple_loss=0.09963, pruned_loss=0.01163, audio_tagging_loss=0.007462, over 15944.00 frames. ], tot_loss[loss=0.06755, simple_loss=0.09159, pruned_loss=0.01296, audio_tagging_loss=0.008794, over 3041195.58 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:37:04,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3062020.0, ans=0.125 2023-11-27 12:37:11,156 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.74 vs. limit=15.0 2023-11-27 12:37:20,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3062086.6666666665, ans=0.2 2023-11-27 12:37:27,242 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.99 vs. limit=22.5 2023-11-27 12:37:28,604 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.37 vs. limit=15.0 2023-11-27 12:37:31,250 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3062153.3333333335, ans=0.125 2023-11-27 12:37:31,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3062153.3333333335, ans=0.025 2023-11-27 12:37:53,006 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 459350 2023-11-27 12:37:53,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3062286.6666666665, ans=0.0 2023-11-27 12:37:59,589 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 2450, loss[loss=0.08513, simple_loss=0.1159, pruned_loss=0.02192, audio_tagging_loss=0.005275, over 15373.00 frames. ], tot_loss[loss=0.06702, simple_loss=0.09087, pruned_loss=0.01266, audio_tagging_loss=0.008921, over 3037967.70 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:38:28,623 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.819e+01 8.321e+01 9.410e+01 9.969e+01 1.274e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-27 12:38:35,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3062553.3333333335, ans=0.1 2023-11-27 12:38:51,240 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 459400 2023-11-27 12:38:51,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3062620.0, ans=0.2 2023-11-27 12:38:57,357 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 2500, loss[loss=0.065, simple_loss=0.08132, pruned_loss=0.01494, audio_tagging_loss=0.009401, over 15043.00 frames. ], tot_loss[loss=0.06705, simple_loss=0.09071, pruned_loss=0.01267, audio_tagging_loss=0.009022, over 3038590.20 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:39:07,518 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3062753.3333333335, ans=0.0 2023-11-27 12:39:13,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3062753.3333333335, ans=0.2 2023-11-27 12:39:21,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3062820.0, ans=0.125 2023-11-27 12:39:22,766 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 12:39:46,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3062953.3333333335, ans=0.0 2023-11-27 12:39:49,348 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 459450 2023-11-27 12:39:54,754 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 2550, loss[loss=0.06538, simple_loss=0.09071, pruned_loss=0.01268, audio_tagging_loss=0.007344, over 14600.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.09017, pruned_loss=0.01257, audio_tagging_loss=0.008927, over 3041748.25 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:39:54,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3063020.0, ans=0.2 2023-11-27 12:40:26,723 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.350e+01 8.678e+01 9.247e+01 1.003e+02 1.223e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-27 12:40:28,581 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.22 vs. limit=22.5 2023-11-27 12:40:42,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3063286.6666666665, ans=0.125 2023-11-27 12:40:46,527 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 459500 2023-11-27 12:40:51,928 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 2600, loss[loss=0.07187, simple_loss=0.1035, pruned_loss=0.0133, audio_tagging_loss=0.006817, over 14974.00 frames. ], tot_loss[loss=0.06674, simple_loss=0.09049, pruned_loss=0.01261, audio_tagging_loss=0.008892, over 3047947.30 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:40:53,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3063353.3333333335, ans=0.1 2023-11-27 12:41:01,900 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3063353.3333333335, ans=0.05 2023-11-27 12:41:05,377 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=14.48 vs. limit=15.0 2023-11-27 12:41:06,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3063420.0, ans=0.125 2023-11-27 12:41:06,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3063420.0, ans=0.125 2023-11-27 12:41:06,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3063420.0, ans=0.04949747468305833 2023-11-27 12:41:45,387 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 459550 2023-11-27 12:41:46,990 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.70 vs. limit=6.0 2023-11-27 12:41:47,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3063620.0, ans=0.0 2023-11-27 12:41:50,869 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 2650, loss[loss=0.0682, simple_loss=0.08747, pruned_loss=0.01291, audio_tagging_loss=0.01155, over 15265.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.08988, pruned_loss=0.01275, audio_tagging_loss=0.00882, over 3041780.68 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:41:52,169 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3063686.6666666665, ans=0.125 2023-11-27 12:42:09,775 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3063753.3333333335, ans=0.0 2023-11-27 12:42:20,391 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.336e+01 8.348e+01 9.301e+01 1.026e+02 1.495e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-27 12:42:20,933 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.83 vs. limit=22.5 2023-11-27 12:42:42,215 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 459600 2023-11-27 12:42:48,208 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 2700, loss[loss=0.05249, simple_loss=0.0737, pruned_loss=0.007784, audio_tagging_loss=0.007853, over 15049.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.08937, pruned_loss=0.01256, audio_tagging_loss=0.00877, over 3046149.91 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:42:49,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3064020.0, ans=0.2 2023-11-27 12:42:59,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3064086.6666666665, ans=0.0 2023-11-27 12:43:01,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3064086.6666666665, ans=0.2 2023-11-27 12:43:29,639 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.48 vs. limit=10.0 2023-11-27 12:43:32,436 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3064220.0, ans=0.125 2023-11-27 12:43:39,943 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 459650 2023-11-27 12:43:43,330 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3064286.6666666665, ans=0.05 2023-11-27 12:43:43,386 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3064286.6666666665, ans=0.0 2023-11-27 12:43:45,274 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 2750, loss[loss=0.08183, simple_loss=0.1066, pruned_loss=0.01912, audio_tagging_loss=0.009389, over 14945.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.0897, pruned_loss=0.01244, audio_tagging_loss=0.008745, over 3046542.82 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:43:52,161 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3064353.3333333335, ans=0.125 2023-11-27 12:44:01,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3064420.0, ans=0.0 2023-11-27 12:44:07,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3064420.0, ans=0.125 2023-11-27 12:44:11,316 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.56 vs. limit=15.0 2023-11-27 12:44:17,405 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.288e+01 8.367e+01 8.947e+01 9.823e+01 1.478e+02, threshold=1.789e+02, percent-clipped=0.0 2023-11-27 12:44:38,966 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 12:44:39,006 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 459700 2023-11-27 12:44:44,998 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 2800, loss[loss=0.06725, simple_loss=0.08946, pruned_loss=0.01254, audio_tagging_loss=0.009976, over 15467.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.09001, pruned_loss=0.01251, audio_tagging_loss=0.00872, over 3049581.12 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:44:45,857 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.82 vs. limit=15.0 2023-11-27 12:44:52,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3064686.6666666665, ans=0.0 2023-11-27 12:45:03,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3064753.3333333335, ans=0.125 2023-11-27 12:45:04,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3064753.3333333335, ans=0.2 2023-11-27 12:45:16,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3064820.0, ans=0.0 2023-11-27 12:45:27,640 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.30 vs. limit=15.0 2023-11-27 12:45:36,845 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 459750 2023-11-27 12:45:42,194 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 2850, loss[loss=0.07107, simple_loss=0.103, pruned_loss=0.01438, audio_tagging_loss=0.00519, over 14338.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.0897, pruned_loss=0.01243, audio_tagging_loss=0.008637, over 3049344.18 frames. ], batch size: 54, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:45:44,656 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3065020.0, ans=0.0 2023-11-27 12:45:46,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3065020.0, ans=0.125 2023-11-27 12:46:14,417 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.057e+01 8.447e+01 9.117e+01 9.906e+01 1.324e+02, threshold=1.823e+02, percent-clipped=0.0 2023-11-27 12:46:16,900 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3065220.0, ans=0.2 2023-11-27 12:46:34,165 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 459800 2023-11-27 12:46:37,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3065286.6666666665, ans=0.1 2023-11-27 12:46:40,267 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 2900, loss[loss=0.07179, simple_loss=0.09434, pruned_loss=0.01703, audio_tagging_loss=0.007591, over 15424.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.08991, pruned_loss=0.01255, audio_tagging_loss=0.008644, over 3043870.83 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:46:42,868 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.96 vs. limit=22.5 2023-11-27 12:46:53,286 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3065420.0, ans=0.2 2023-11-27 12:46:56,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3065420.0, ans=0.1 2023-11-27 12:47:02,556 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3065420.0, ans=0.0 2023-11-27 12:47:04,598 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3065486.6666666665, ans=0.1 2023-11-27 12:47:09,107 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 12:47:20,250 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3065553.3333333335, ans=0.125 2023-11-27 12:47:20,708 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=14.69 vs. limit=15.0 2023-11-27 12:47:33,592 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 459850 2023-11-27 12:47:39,700 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 2950, loss[loss=0.07318, simple_loss=0.09879, pruned_loss=0.01524, audio_tagging_loss=0.008544, over 16031.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.08873, pruned_loss=0.01247, audio_tagging_loss=0.00882, over 3036076.93 frames. ], batch size: 60, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:47:47,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3065686.6666666665, ans=0.125 2023-11-27 12:47:48,658 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.16 vs. limit=10.0 2023-11-27 12:47:50,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3065753.3333333335, ans=0.09899494936611666 2023-11-27 12:47:56,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3065753.3333333335, ans=0.1 2023-11-27 12:48:01,520 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3065820.0, ans=0.1 2023-11-27 12:48:01,522 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3065820.0, ans=0.125 2023-11-27 12:48:03,660 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3065820.0, ans=0.0 2023-11-27 12:48:07,051 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.520e-03 2023-11-27 12:48:09,151 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3065820.0, ans=0.125 2023-11-27 12:48:10,986 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.284e+01 8.648e+01 9.263e+01 1.004e+02 1.641e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-27 12:48:30,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3065953.3333333335, ans=0.125 2023-11-27 12:48:32,120 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 459900 2023-11-27 12:48:32,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3065953.3333333335, ans=0.125 2023-11-27 12:48:33,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3065953.3333333335, ans=0.125 2023-11-27 12:48:37,509 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 3000, loss[loss=0.06083, simple_loss=0.07749, pruned_loss=0.01219, audio_tagging_loss=0.009893, over 14925.00 frames. ], tot_loss[loss=0.06645, simple_loss=0.08974, pruned_loss=0.01273, audio_tagging_loss=0.008848, over 3036893.60 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:48:37,510 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-27 12:49:11,853 INFO [train_asr.py:1267] (1/4) Epoch 39, validation: loss=0.05767, simple_loss=0.05074, pruned_loss=0.005233, audio_tagging_loss=0.02707, over 4681554.00 frames. 2023-11-27 12:49:11,853 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-27 12:49:14,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3066020.0, ans=0.5 2023-11-27 12:49:38,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3066153.3333333335, ans=0.05 2023-11-27 12:49:43,748 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 12:49:43,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3066153.3333333335, ans=0.125 2023-11-27 12:49:52,152 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.57 vs. limit=15.0 2023-11-27 12:50:05,471 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 459950 2023-11-27 12:50:05,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3066286.6666666665, ans=0.1 2023-11-27 12:50:10,839 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 3050, loss[loss=0.05679, simple_loss=0.06936, pruned_loss=0.01108, audio_tagging_loss=0.01103, over 13676.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.08966, pruned_loss=0.01266, audio_tagging_loss=0.008888, over 3030694.82 frames. ], batch size: 53, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:50:15,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3066353.3333333335, ans=0.125 2023-11-27 12:50:26,194 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3066420.0, ans=0.0 2023-11-27 12:50:33,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3066486.6666666665, ans=0.0 2023-11-27 12:50:42,418 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.394e+01 8.816e+01 9.327e+01 1.004e+02 1.225e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-27 12:50:47,050 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 12:50:48,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3066553.3333333335, ans=0.125 2023-11-27 12:50:50,085 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3066553.3333333335, ans=0.125 2023-11-27 12:50:53,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3066553.3333333335, ans=0.0 2023-11-27 12:51:03,726 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 460000 2023-11-27 12:51:04,486 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.17 vs. limit=6.0 2023-11-27 12:51:11,794 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 3100, loss[loss=0.07576, simple_loss=0.1002, pruned_loss=0.01493, audio_tagging_loss=0.0107, over 13366.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.08975, pruned_loss=0.01289, audio_tagging_loss=0.008923, over 3037789.34 frames. ], batch size: 51, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:51:18,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3066686.6666666665, ans=0.125 2023-11-27 12:51:34,924 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.74 vs. limit=15.0 2023-11-27 12:52:01,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3066953.3333333335, ans=0.1 2023-11-27 12:52:04,082 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 460050 2023-11-27 12:52:09,038 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.83 vs. limit=22.5 2023-11-27 12:52:09,593 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 3150, loss[loss=0.07167, simple_loss=0.09735, pruned_loss=0.01381, audio_tagging_loss=0.009182, over 15003.00 frames. ], tot_loss[loss=0.06691, simple_loss=0.09021, pruned_loss=0.01278, audio_tagging_loss=0.009024, over 3039435.09 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:52:42,632 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.505e+01 8.556e+01 9.152e+01 9.802e+01 1.189e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-27 12:53:02,628 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 460100 2023-11-27 12:53:04,984 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.73 vs. limit=22.5 2023-11-27 12:53:08,854 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 3200, loss[loss=0.04885, simple_loss=0.06191, pruned_loss=0.007649, audio_tagging_loss=0.01024, over 13748.00 frames. ], tot_loss[loss=0.06684, simple_loss=0.09035, pruned_loss=0.01266, audio_tagging_loss=0.009, over 3039756.95 frames. ], batch size: 53, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:53:19,569 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.52 vs. limit=15.0 2023-11-27 12:53:27,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3067420.0, ans=0.125 2023-11-27 12:53:38,872 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.20 vs. limit=15.0 2023-11-27 12:53:40,720 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3067486.6666666665, ans=0.125 2023-11-27 12:53:49,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3067553.3333333335, ans=0.125 2023-11-27 12:54:01,026 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 460150 2023-11-27 12:54:06,399 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 3250, loss[loss=0.04859, simple_loss=0.0638, pruned_loss=0.006589, audio_tagging_loss=0.0101, over 15283.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.0902, pruned_loss=0.01264, audio_tagging_loss=0.009126, over 3037821.81 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:54:13,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3067686.6666666665, ans=0.125 2023-11-27 12:54:34,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3067820.0, ans=0.1 2023-11-27 12:54:34,483 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3067820.0, ans=0.125 2023-11-27 12:54:39,776 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.649e+01 8.843e+01 9.410e+01 1.014e+02 1.334e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-27 12:54:59,304 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 460200 2023-11-27 12:55:03,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3067953.3333333335, ans=0.0 2023-11-27 12:55:05,135 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 3300, loss[loss=0.08554, simple_loss=0.1107, pruned_loss=0.0201, audio_tagging_loss=0.0101, over 14210.00 frames. ], tot_loss[loss=0.06752, simple_loss=0.09087, pruned_loss=0.01291, audio_tagging_loss=0.009175, over 3036567.12 frames. ], batch size: 52, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:55:06,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3068020.0, ans=0.2 2023-11-27 12:55:08,066 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.06 vs. limit=15.0 2023-11-27 12:55:09,020 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.42 vs. limit=15.0 2023-11-27 12:55:10,894 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 12:55:19,351 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3068086.6666666665, ans=0.05 2023-11-27 12:55:28,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3068153.3333333335, ans=0.0 2023-11-27 12:55:29,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3068153.3333333335, ans=0.125 2023-11-27 12:55:37,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3068153.3333333335, ans=0.125 2023-11-27 12:55:42,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3068220.0, ans=0.125 2023-11-27 12:55:58,066 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 460250 2023-11-27 12:55:58,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3068286.6666666665, ans=0.09899494936611666 2023-11-27 12:56:04,635 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 3350, loss[loss=0.07031, simple_loss=0.09465, pruned_loss=0.01359, audio_tagging_loss=0.009399, over 16000.00 frames. ], tot_loss[loss=0.06697, simple_loss=0.0903, pruned_loss=0.01275, audio_tagging_loss=0.00907, over 3036397.52 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:56:10,483 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3068353.3333333335, ans=0.125 2023-11-27 12:56:36,193 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.408e+01 8.562e+01 9.285e+01 9.934e+01 1.474e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-27 12:56:56,291 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.86 vs. limit=6.0 2023-11-27 12:56:56,889 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 460300 2023-11-27 12:57:01,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3068686.6666666665, ans=0.0 2023-11-27 12:57:02,298 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 3400, loss[loss=0.05673, simple_loss=0.07814, pruned_loss=0.00977, audio_tagging_loss=0.007889, over 14267.00 frames. ], tot_loss[loss=0.06667, simple_loss=0.09024, pruned_loss=0.01266, audio_tagging_loss=0.008892, over 3032161.81 frames. ], batch size: 54, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:57:16,578 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3068753.3333333335, ans=0.125 2023-11-27 12:57:20,860 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3068753.3333333335, ans=0.0 2023-11-27 12:57:45,828 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.19 vs. limit=22.5 2023-11-27 12:57:54,081 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 460350 2023-11-27 12:57:57,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=3068953.3333333335, ans=10.0 2023-11-27 12:58:00,427 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 3450, loss[loss=0.05671, simple_loss=0.07601, pruned_loss=0.01346, audio_tagging_loss=0.005241, over 14240.00 frames. ], tot_loss[loss=0.06676, simple_loss=0.09034, pruned_loss=0.01277, audio_tagging_loss=0.008824, over 3028008.11 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:58:12,178 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 12:58:17,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3069086.6666666665, ans=0.5 2023-11-27 12:58:20,028 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3069086.6666666665, ans=0.05 2023-11-27 12:58:32,869 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.182e+01 8.564e+01 9.034e+01 9.987e+01 1.555e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-27 12:58:41,220 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.97 vs. limit=15.0 2023-11-27 12:58:43,704 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3069220.0, ans=0.125 2023-11-27 12:58:50,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3069286.6666666665, ans=0.1 2023-11-27 12:58:50,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3069286.6666666665, ans=0.2 2023-11-27 12:58:52,145 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 460400 2023-11-27 12:58:53,853 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3069286.6666666665, ans=0.09899494936611666 2023-11-27 12:58:58,879 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 3500, loss[loss=0.07308, simple_loss=0.1001, pruned_loss=0.01482, audio_tagging_loss=0.008218, over 16192.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.08997, pruned_loss=0.0129, audio_tagging_loss=0.008773, over 3026824.11 frames. ], batch size: 61, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:59:06,731 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.13 vs. limit=15.0 2023-11-27 12:59:18,522 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.59 vs. limit=15.0 2023-11-27 12:59:30,593 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 12:59:30,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3069486.6666666665, ans=0.0 2023-11-27 12:59:34,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3069553.3333333335, ans=0.125 2023-11-27 12:59:45,237 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.49 vs. limit=10.0 2023-11-27 12:59:51,338 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 460450 2023-11-27 12:59:51,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3069620.0, ans=0.125 2023-11-27 12:59:56,782 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 3550, loss[loss=0.0822, simple_loss=0.1033, pruned_loss=0.02045, audio_tagging_loss=0.01009, over 14953.00 frames. ], tot_loss[loss=0.06663, simple_loss=0.09003, pruned_loss=0.0129, audio_tagging_loss=0.008709, over 3031148.27 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:00:19,141 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.20 vs. limit=5.0 2023-11-27 13:00:21,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3069820.0, ans=0.0 2023-11-27 13:00:30,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3069820.0, ans=0.1 2023-11-27 13:00:30,929 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.317e+01 8.573e+01 9.051e+01 9.738e+01 1.451e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-27 13:00:45,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3069953.3333333335, ans=0.125 2023-11-27 13:00:48,473 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 460500 2023-11-27 13:00:50,927 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3069953.3333333335, ans=0.125 2023-11-27 13:00:54,038 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 3600, loss[loss=0.08289, simple_loss=0.1109, pruned_loss=0.02046, audio_tagging_loss=0.006961, over 15631.00 frames. ], tot_loss[loss=0.06689, simple_loss=0.09044, pruned_loss=0.013, audio_tagging_loss=0.008668, over 3035092.56 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 13:01:25,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3070153.3333333335, ans=0.125 2023-11-27 13:01:32,162 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.69 vs. limit=6.0 2023-11-27 13:01:36,207 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3070220.0, ans=0.0 2023-11-27 13:01:46,328 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 460550 2023-11-27 13:01:52,380 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 3650, loss[loss=0.05564, simple_loss=0.06999, pruned_loss=0.01212, audio_tagging_loss=0.008523, over 15627.00 frames. ], tot_loss[loss=0.06687, simple_loss=0.09021, pruned_loss=0.01309, audio_tagging_loss=0.008672, over 3032975.08 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 13:02:09,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3070420.0, ans=0.1 2023-11-27 13:02:10,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3070420.0, ans=0.125 2023-11-27 13:02:12,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3070420.0, ans=0.125 2023-11-27 13:02:18,725 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.98 vs. limit=15.0 2023-11-27 13:02:25,691 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.081e+01 8.646e+01 9.266e+01 1.020e+02 1.594e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-27 13:02:31,981 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3070553.3333333335, ans=0.0 2023-11-27 13:02:45,673 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 460600 2023-11-27 13:02:48,669 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.29 vs. limit=22.5 2023-11-27 13:02:51,412 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 3700, loss[loss=0.03832, simple_loss=0.04755, pruned_loss=0.007255, audio_tagging_loss=0.007293, over 14993.00 frames. ], tot_loss[loss=0.0672, simple_loss=0.09058, pruned_loss=0.01323, audio_tagging_loss=0.008682, over 3041788.03 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 13:03:10,379 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3070753.3333333335, ans=0.125 2023-11-27 13:03:18,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3070820.0, ans=0.1 2023-11-27 13:03:43,361 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 460650 2023-11-27 13:03:48,836 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 3750, loss[loss=0.07337, simple_loss=0.1007, pruned_loss=0.0151, audio_tagging_loss=0.007906, over 15367.00 frames. ], tot_loss[loss=0.06717, simple_loss=0.09056, pruned_loss=0.01311, audio_tagging_loss=0.008782, over 3044270.74 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 13:03:50,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3071020.0, ans=0.2 2023-11-27 13:03:52,604 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.85 vs. limit=10.0 2023-11-27 13:03:57,250 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.68 vs. limit=22.5 2023-11-27 13:04:03,498 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.55 vs. limit=15.0 2023-11-27 13:04:05,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3071086.6666666665, ans=0.125 2023-11-27 13:04:23,263 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.351e+01 8.737e+01 9.313e+01 1.018e+02 1.236e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-27 13:04:32,117 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 13:04:34,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3071286.6666666665, ans=0.0 2023-11-27 13:04:34,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=3071286.6666666665, ans=15.0 2023-11-27 13:04:40,839 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 460700 2023-11-27 13:04:42,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3071286.6666666665, ans=0.125 2023-11-27 13:04:46,908 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 3800, loss[loss=0.05479, simple_loss=0.07127, pruned_loss=0.009024, audio_tagging_loss=0.01013, over 14714.00 frames. ], tot_loss[loss=0.0667, simple_loss=0.08987, pruned_loss=0.0129, audio_tagging_loss=0.008858, over 3050383.81 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 13:04:55,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3071353.3333333335, ans=0.0 2023-11-27 13:05:25,215 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3071553.3333333335, ans=0.0 2023-11-27 13:05:26,138 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3071553.3333333335, ans=0.1 2023-11-27 13:05:31,606 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.56 vs. limit=22.5 2023-11-27 13:05:40,412 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 460750 2023-11-27 13:05:40,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3071620.0, ans=0.0 2023-11-27 13:05:45,989 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 3850, loss[loss=0.07046, simple_loss=0.08683, pruned_loss=0.01647, audio_tagging_loss=0.01057, over 16184.00 frames. ], tot_loss[loss=0.06748, simple_loss=0.09113, pruned_loss=0.01302, audio_tagging_loss=0.008891, over 3053680.26 frames. ], batch size: 64, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 13:06:09,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3071820.0, ans=0.04949747468305833 2023-11-27 13:06:18,576 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.198e+01 8.582e+01 9.212e+01 9.899e+01 1.418e+02, threshold=1.842e+02, percent-clipped=0.0 2023-11-27 13:06:20,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3071886.6666666665, ans=0.1 2023-11-27 13:06:21,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3071886.6666666665, ans=0.0 2023-11-27 13:06:27,021 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.89 vs. limit=6.0 2023-11-27 13:06:35,469 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3071953.3333333335, ans=0.125 2023-11-27 13:06:37,406 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 460800 2023-11-27 13:06:37,830 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.33 vs. limit=15.0 2023-11-27 13:06:43,157 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 3900, loss[loss=0.09159, simple_loss=0.1252, pruned_loss=0.0224, audio_tagging_loss=0.006589, over 14392.00 frames. ], tot_loss[loss=0.06744, simple_loss=0.09116, pruned_loss=0.01304, audio_tagging_loss=0.008815, over 3045274.77 frames. ], batch size: 53, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 13:07:12,198 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.18 vs. limit=15.0 2023-11-27 13:07:17,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3072220.0, ans=0.125 2023-11-27 13:07:22,929 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3072220.0, ans=0.125 2023-11-27 13:07:34,934 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 460850 2023-11-27 13:07:40,307 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 3950, loss[loss=0.06122, simple_loss=0.08794, pruned_loss=0.009205, audio_tagging_loss=0.008047, over 16571.00 frames. ], tot_loss[loss=0.06724, simple_loss=0.09097, pruned_loss=0.01289, audio_tagging_loss=0.008859, over 3044487.05 frames. ], batch size: 63, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 13:07:40,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3072353.3333333335, ans=0.0 2023-11-27 13:07:58,525 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3072420.0, ans=0.0 2023-11-27 13:08:07,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3072486.6666666665, ans=0.1 2023-11-27 13:08:14,136 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.99 vs. limit=15.0 2023-11-27 13:08:15,680 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.157e+01 8.964e+01 9.425e+01 1.000e+02 1.456e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-27 13:08:16,227 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.59 vs. limit=15.0 2023-11-27 13:08:24,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3072553.3333333335, ans=0.2 2023-11-27 13:08:33,494 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 460900 2023-11-27 13:08:39,771 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 4000, loss[loss=0.05444, simple_loss=0.06516, pruned_loss=0.00965, audio_tagging_loss=0.01221, over 14310.00 frames. ], tot_loss[loss=0.06698, simple_loss=0.09073, pruned_loss=0.01266, audio_tagging_loss=0.008952, over 3042459.71 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 13:09:31,991 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 460950 2023-11-27 13:09:37,346 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 4050, loss[loss=0.06099, simple_loss=0.08351, pruned_loss=0.01034, audio_tagging_loss=0.008897, over 14602.00 frames. ], tot_loss[loss=0.06751, simple_loss=0.09144, pruned_loss=0.01282, audio_tagging_loss=0.008969, over 3048776.88 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:09:37,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3073020.0, ans=0.125 2023-11-27 13:09:43,861 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 13:09:55,052 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3073086.6666666665, ans=0.125 2023-11-27 13:10:00,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3073153.3333333335, ans=0.025 2023-11-27 13:10:11,529 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3073220.0, ans=0.125 2023-11-27 13:10:12,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=3073220.0, ans=15.0 2023-11-27 13:10:13,380 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.583e+01 8.613e+01 9.165e+01 1.034e+02 1.408e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-27 13:10:16,169 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.whiten.whitening_limit, batch_count=3073220.0, ans=12.0 2023-11-27 13:10:28,755 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 461000 2023-11-27 13:10:34,545 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 4100, loss[loss=0.06393, simple_loss=0.07268, pruned_loss=0.01454, audio_tagging_loss=0.01304, over 13733.00 frames. ], tot_loss[loss=0.06761, simple_loss=0.09142, pruned_loss=0.01286, audio_tagging_loss=0.00904, over 3045147.16 frames. ], batch size: 52, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:10:40,697 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.29 vs. limit=15.0 2023-11-27 13:10:49,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3073420.0, ans=0.0 2023-11-27 13:10:54,583 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3073420.0, ans=0.125 2023-11-27 13:11:13,706 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.65 vs. limit=15.0 2023-11-27 13:11:25,913 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3073620.0, ans=0.0 2023-11-27 13:11:26,800 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 461050 2023-11-27 13:11:33,387 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 4150, loss[loss=0.05597, simple_loss=0.07912, pruned_loss=0.00603, audio_tagging_loss=0.01038, over 15309.00 frames. ], tot_loss[loss=0.06734, simple_loss=0.0913, pruned_loss=0.01277, audio_tagging_loss=0.008923, over 3045429.85 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:11:37,582 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3073686.6666666665, ans=0.0 2023-11-27 13:11:51,363 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.39 vs. limit=15.0 2023-11-27 13:11:51,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3073753.3333333335, ans=0.0 2023-11-27 13:12:00,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3073820.0, ans=0.125 2023-11-27 13:12:02,379 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.68 vs. limit=22.5 2023-11-27 13:12:08,198 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.828e+01 8.439e+01 9.039e+01 1.003e+02 1.334e+02, threshold=1.808e+02, percent-clipped=0.0 2023-11-27 13:12:18,766 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 13:12:25,957 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 461100 2023-11-27 13:12:30,842 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.11 vs. limit=15.0 2023-11-27 13:12:31,254 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 4200, loss[loss=0.05447, simple_loss=0.07443, pruned_loss=0.009489, audio_tagging_loss=0.007763, over 14817.00 frames. ], tot_loss[loss=0.06715, simple_loss=0.09128, pruned_loss=0.01268, audio_tagging_loss=0.008831, over 3043075.67 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 8.0 2023-11-27 13:12:44,476 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3074086.6666666665, ans=0.015 2023-11-27 13:12:44,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3074086.6666666665, ans=0.125 2023-11-27 13:13:00,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3074153.3333333335, ans=0.125 2023-11-27 13:13:06,186 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3074220.0, ans=0.0 2023-11-27 13:13:10,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3074220.0, ans=0.1 2023-11-27 13:13:23,401 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 461150 2023-11-27 13:13:27,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3074353.3333333335, ans=0.0 2023-11-27 13:13:28,873 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 4250, loss[loss=0.06431, simple_loss=0.08467, pruned_loss=0.01394, audio_tagging_loss=0.008036, over 15111.00 frames. ], tot_loss[loss=0.06721, simple_loss=0.09141, pruned_loss=0.01273, audio_tagging_loss=0.00878, over 3044045.17 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 8.0 2023-11-27 13:14:04,066 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.56 vs. limit=15.0 2023-11-27 13:14:06,425 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.413e+01 8.652e+01 9.233e+01 9.893e+01 1.518e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-27 13:14:13,754 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.89 vs. limit=22.5 2023-11-27 13:14:20,322 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.34 vs. limit=22.5 2023-11-27 13:14:20,879 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 461200 2023-11-27 13:14:26,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3074686.6666666665, ans=0.125 2023-11-27 13:14:27,648 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 4300, loss[loss=0.0727, simple_loss=0.1046, pruned_loss=0.01379, audio_tagging_loss=0.006613, over 14575.00 frames. ], tot_loss[loss=0.06693, simple_loss=0.09145, pruned_loss=0.01261, audio_tagging_loss=0.008598, over 3046548.78 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 8.0 2023-11-27 13:14:32,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3074686.6666666665, ans=0.2 2023-11-27 13:14:38,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3074753.3333333335, ans=0.0 2023-11-27 13:14:47,765 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3074753.3333333335, ans=0.0 2023-11-27 13:14:50,451 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.28 vs. limit=15.0 2023-11-27 13:15:06,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3074886.6666666665, ans=0.05 2023-11-27 13:15:19,516 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.32 vs. limit=15.0 2023-11-27 13:15:19,944 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 461250 2023-11-27 13:15:21,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3074953.3333333335, ans=0.05 2023-11-27 13:15:25,875 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 4350, loss[loss=0.08297, simple_loss=0.1072, pruned_loss=0.02126, audio_tagging_loss=0.008107, over 15013.00 frames. ], tot_loss[loss=0.06714, simple_loss=0.09157, pruned_loss=0.01278, audio_tagging_loss=0.008569, over 3038144.50 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 8.0 2023-11-27 13:15:27,560 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.76 vs. limit=15.0 2023-11-27 13:16:00,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3075220.0, ans=0.0 2023-11-27 13:16:02,975 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.093e+01 8.774e+01 9.357e+01 9.883e+01 1.317e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-27 13:16:04,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=3075220.0, ans=0.1 2023-11-27 13:16:18,148 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 461300 2023-11-27 13:16:23,504 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 4400, loss[loss=0.07818, simple_loss=0.113, pruned_loss=0.01413, audio_tagging_loss=0.007562, over 15271.00 frames. ], tot_loss[loss=0.06695, simple_loss=0.09115, pruned_loss=0.01271, audio_tagging_loss=0.008662, over 3044300.84 frames. ], batch size: 52, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:16:29,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3075353.3333333335, ans=0.125 2023-11-27 13:17:15,663 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 461350 2023-11-27 13:17:21,481 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 4450, loss[loss=0.0857, simple_loss=0.1208, pruned_loss=0.01738, audio_tagging_loss=0.007927, over 15112.00 frames. ], tot_loss[loss=0.06696, simple_loss=0.09128, pruned_loss=0.01279, audio_tagging_loss=0.008531, over 3050972.45 frames. ], batch size: 54, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:17:35,052 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3075753.3333333335, ans=0.0 2023-11-27 13:17:43,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3075753.3333333335, ans=0.125 2023-11-27 13:17:50,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3075820.0, ans=10.0 2023-11-27 13:17:58,569 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.068e+01 8.640e+01 9.339e+01 1.014e+02 1.202e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-27 13:17:58,777 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3075886.6666666665, ans=0.0 2023-11-27 13:18:01,493 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.22 vs. limit=12.0 2023-11-27 13:18:07,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3075953.3333333335, ans=0.125 2023-11-27 13:18:14,589 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 461400 2023-11-27 13:18:20,238 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 4500, loss[loss=0.07178, simple_loss=0.0928, pruned_loss=0.01464, audio_tagging_loss=0.01074, over 15142.00 frames. ], tot_loss[loss=0.06722, simple_loss=0.0915, pruned_loss=0.013, audio_tagging_loss=0.008477, over 3052986.67 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:18:23,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=3076020.0, ans=15.0 2023-11-27 13:18:46,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3076153.3333333335, ans=0.015 2023-11-27 13:18:53,122 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.57 vs. limit=22.5 2023-11-27 13:19:08,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3076286.6666666665, ans=0.0 2023-11-27 13:19:11,588 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 461450 2023-11-27 13:19:17,602 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 4550, loss[loss=0.05043, simple_loss=0.06539, pruned_loss=0.008689, audio_tagging_loss=0.009046, over 14842.00 frames. ], tot_loss[loss=0.06696, simple_loss=0.09106, pruned_loss=0.01289, audio_tagging_loss=0.008544, over 3047983.22 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:19:30,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3076420.0, ans=0.125 2023-11-27 13:19:46,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3076486.6666666665, ans=0.125 2023-11-27 13:19:53,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3076553.3333333335, ans=0.125 2023-11-27 13:19:54,265 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.721e+01 8.672e+01 9.431e+01 1.029e+02 1.211e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-27 13:19:54,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3076553.3333333335, ans=0.125 2023-11-27 13:19:56,044 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.17 vs. limit=15.0 2023-11-27 13:19:59,067 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.19 vs. limit=12.0 2023-11-27 13:20:01,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3076553.3333333335, ans=0.0 2023-11-27 13:20:03,274 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.92 vs. limit=15.0 2023-11-27 13:20:04,922 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 13:20:09,216 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 461500 2023-11-27 13:20:11,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3076620.0, ans=0.125 2023-11-27 13:20:14,516 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 4600, loss[loss=0.06215, simple_loss=0.08111, pruned_loss=0.01382, audio_tagging_loss=0.007776, over 13938.00 frames. ], tot_loss[loss=0.06687, simple_loss=0.09078, pruned_loss=0.01286, audio_tagging_loss=0.008619, over 3045889.13 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:20:39,640 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.83 vs. limit=15.0 2023-11-27 13:20:53,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3076886.6666666665, ans=0.2 2023-11-27 13:21:08,137 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 461550 2023-11-27 13:21:12,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3077020.0, ans=0.2 2023-11-27 13:21:13,508 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 4650, loss[loss=0.065, simple_loss=0.0934, pruned_loss=0.0107, audio_tagging_loss=0.0076, over 15544.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.08989, pruned_loss=0.01265, audio_tagging_loss=0.008742, over 3047517.50 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:21:49,995 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.494e+01 8.746e+01 9.217e+01 9.897e+01 1.196e+02, threshold=1.843e+02, percent-clipped=0.0 2023-11-27 13:22:01,120 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.75 vs. limit=12.0 2023-11-27 13:22:02,752 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3077286.6666666665, ans=0.125 2023-11-27 13:22:04,879 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 461600 2023-11-27 13:22:04,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3077286.6666666665, ans=0.0 2023-11-27 13:22:10,672 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 4700, loss[loss=0.05985, simple_loss=0.07398, pruned_loss=0.01135, audio_tagging_loss=0.01151, over 15505.00 frames. ], tot_loss[loss=0.06699, simple_loss=0.09065, pruned_loss=0.01278, audio_tagging_loss=0.008885, over 3046522.66 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:22:43,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3077486.6666666665, ans=0.125 2023-11-27 13:22:50,085 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3077553.3333333335, ans=0.2 2023-11-27 13:22:57,885 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3077620.0, ans=10.0 2023-11-27 13:23:02,777 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 461650 2023-11-27 13:23:08,072 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 4750, loss[loss=0.09081, simple_loss=0.1166, pruned_loss=0.0219, audio_tagging_loss=0.01059, over 15408.00 frames. ], tot_loss[loss=0.06699, simple_loss=0.09049, pruned_loss=0.01275, audio_tagging_loss=0.00899, over 3048244.54 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:23:08,437 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 13:23:21,385 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.03 vs. limit=10.0 2023-11-27 13:23:33,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3077820.0, ans=0.125 2023-11-27 13:23:45,251 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.848e+01 8.817e+01 9.459e+01 1.019e+02 1.212e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-27 13:24:00,658 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 461700 2023-11-27 13:24:06,653 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 4800, loss[loss=0.06585, simple_loss=0.09021, pruned_loss=0.009623, audio_tagging_loss=0.01112, over 15165.00 frames. ], tot_loss[loss=0.06723, simple_loss=0.09065, pruned_loss=0.01282, audio_tagging_loss=0.009084, over 3047928.42 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 13:24:28,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3078153.3333333335, ans=0.0 2023-11-27 13:24:47,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3078220.0, ans=0.1 2023-11-27 13:24:51,492 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3078286.6666666665, ans=0.1 2023-11-27 13:24:57,847 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 461750 2023-11-27 13:24:59,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3078286.6666666665, ans=0.1 2023-11-27 13:25:03,199 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 4850, loss[loss=0.06185, simple_loss=0.07995, pruned_loss=0.01124, audio_tagging_loss=0.01063, over 15719.00 frames. ], tot_loss[loss=0.067, simple_loss=0.09019, pruned_loss=0.01273, audio_tagging_loss=0.009186, over 3043500.68 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 13:25:03,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3078353.3333333335, ans=0.0 2023-11-27 13:25:33,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3078486.6666666665, ans=0.125 2023-11-27 13:25:35,678 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.57 vs. limit=12.0 2023-11-27 13:25:40,278 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.038e+01 8.650e+01 9.327e+01 1.023e+02 1.195e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-27 13:25:54,482 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 461800 2023-11-27 13:26:00,786 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 4900, loss[loss=0.06617, simple_loss=0.08508, pruned_loss=0.01311, audio_tagging_loss=0.01052, over 15453.00 frames. ], tot_loss[loss=0.0673, simple_loss=0.09088, pruned_loss=0.01274, audio_tagging_loss=0.009124, over 3044354.33 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 13:26:02,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3078686.6666666665, ans=0.0 2023-11-27 13:26:10,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3078686.6666666665, ans=0.0 2023-11-27 13:26:21,418 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.95 vs. limit=15.0 2023-11-27 13:26:23,825 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.80 vs. limit=22.5 2023-11-27 13:26:26,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3078820.0, ans=0.125 2023-11-27 13:26:27,637 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 13:26:42,856 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 13:26:45,581 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3078953.3333333335, ans=0.2 2023-11-27 13:26:52,531 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 461850 2023-11-27 13:26:57,724 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3079020.0, ans=0.2 2023-11-27 13:26:58,555 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 4950, loss[loss=0.09066, simple_loss=0.1257, pruned_loss=0.02157, audio_tagging_loss=0.006234, over 15858.00 frames. ], tot_loss[loss=0.06718, simple_loss=0.09098, pruned_loss=0.01278, audio_tagging_loss=0.008908, over 3043132.16 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 13:27:18,818 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 13:27:34,977 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.277e+01 8.478e+01 9.080e+01 9.742e+01 1.240e+02, threshold=1.816e+02, percent-clipped=0.0 2023-11-27 13:27:37,700 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.82 vs. limit=10.0 2023-11-27 13:27:50,475 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 461900 2023-11-27 13:27:55,869 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 5000, loss[loss=0.05365, simple_loss=0.07465, pruned_loss=0.00668, audio_tagging_loss=0.009643, over 15335.00 frames. ], tot_loss[loss=0.06728, simple_loss=0.09141, pruned_loss=0.01284, audio_tagging_loss=0.008747, over 3048952.98 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 13:28:03,688 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3079353.3333333335, ans=0.125 2023-11-27 13:28:27,438 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.00 vs. limit=15.0 2023-11-27 13:28:45,546 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3079620.0, ans=0.1 2023-11-27 13:28:47,015 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.09 vs. limit=15.0 2023-11-27 13:28:47,594 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 461950 2023-11-27 13:28:52,928 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 5050, loss[loss=0.03557, simple_loss=0.0438, pruned_loss=0.004385, audio_tagging_loss=0.009284, over 16280.00 frames. ], tot_loss[loss=0.0671, simple_loss=0.09109, pruned_loss=0.01278, audio_tagging_loss=0.008782, over 3045347.10 frames. ], batch size: 66, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 13:28:59,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3079686.6666666665, ans=0.125 2023-11-27 13:29:26,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3079886.6666666665, ans=0.025 2023-11-27 13:29:29,246 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.036e+01 8.853e+01 9.456e+01 1.016e+02 1.610e+02, threshold=1.891e+02, percent-clipped=0.0 2023-11-27 13:29:38,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3079953.3333333335, ans=0.0 2023-11-27 13:29:43,985 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 462000 2023-11-27 13:29:50,854 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 5100, loss[loss=0.08217, simple_loss=0.1129, pruned_loss=0.01703, audio_tagging_loss=0.008676, over 15189.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.09064, pruned_loss=0.01276, audio_tagging_loss=0.00878, over 3044041.58 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:29:51,313 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.13 vs. limit=6.0 2023-11-27 13:30:01,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3080086.6666666665, ans=0.0 2023-11-27 13:30:13,359 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3080153.3333333335, ans=0.125 2023-11-27 13:30:18,026 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.80 vs. limit=6.0 2023-11-27 13:30:29,209 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.63 vs. limit=15.0 2023-11-27 13:30:34,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3080220.0, ans=0.0 2023-11-27 13:30:42,791 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 462050 2023-11-27 13:30:44,325 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.08 vs. limit=10.0 2023-11-27 13:30:48,254 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 5150, loss[loss=0.08898, simple_loss=0.1309, pruned_loss=0.01701, audio_tagging_loss=0.006499, over 16450.00 frames. ], tot_loss[loss=0.06678, simple_loss=0.09058, pruned_loss=0.01271, audio_tagging_loss=0.008778, over 3045432.55 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:31:26,837 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.025e+01 8.502e+01 9.223e+01 9.833e+01 1.321e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-27 13:31:27,196 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3080553.3333333335, ans=0.0 2023-11-27 13:31:29,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3080553.3333333335, ans=0.125 2023-11-27 13:31:39,964 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 462100 2023-11-27 13:31:44,614 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3080686.6666666665, ans=0.0 2023-11-27 13:31:45,306 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 5200, loss[loss=0.06714, simple_loss=0.09597, pruned_loss=0.008979, audio_tagging_loss=0.01018, over 15917.00 frames. ], tot_loss[loss=0.06697, simple_loss=0.09109, pruned_loss=0.01268, audio_tagging_loss=0.008745, over 3039740.74 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 13:32:00,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3080753.3333333335, ans=0.0 2023-11-27 13:32:21,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3080886.6666666665, ans=0.125 2023-11-27 13:32:28,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3080886.6666666665, ans=0.2 2023-11-27 13:32:36,145 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 462150 2023-11-27 13:32:42,074 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 5250, loss[loss=0.06486, simple_loss=0.08875, pruned_loss=0.009953, audio_tagging_loss=0.01053, over 14568.00 frames. ], tot_loss[loss=0.06688, simple_loss=0.09088, pruned_loss=0.01273, audio_tagging_loss=0.008707, over 3041026.58 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:32:47,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3081020.0, ans=0.125 2023-11-27 13:33:20,210 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.573e+01 8.787e+01 9.211e+01 1.001e+02 1.224e+02, threshold=1.842e+02, percent-clipped=0.0 2023-11-27 13:33:34,947 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 462200 2023-11-27 13:33:40,676 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 5300, loss[loss=0.06991, simple_loss=0.09036, pruned_loss=0.01347, audio_tagging_loss=0.01126, over 14649.00 frames. ], tot_loss[loss=0.0669, simple_loss=0.09097, pruned_loss=0.01274, audio_tagging_loss=0.008678, over 3044084.28 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:34:02,843 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 13:34:04,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3081486.6666666665, ans=0.125 2023-11-27 13:34:13,738 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3081553.3333333335, ans=0.0 2023-11-27 13:34:18,495 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3081553.3333333335, ans=0.0 2023-11-27 13:34:30,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3081620.0, ans=0.0 2023-11-27 13:34:30,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3081620.0, ans=0.0 2023-11-27 13:34:32,353 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 462250 2023-11-27 13:34:37,737 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 5350, loss[loss=0.07759, simple_loss=0.1065, pruned_loss=0.01332, audio_tagging_loss=0.01103, over 14593.00 frames. ], tot_loss[loss=0.06706, simple_loss=0.09128, pruned_loss=0.01274, audio_tagging_loss=0.00869, over 3046437.36 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:34:44,767 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.29 vs. limit=15.0 2023-11-27 13:34:45,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3081686.6666666665, ans=0.0 2023-11-27 13:34:51,864 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.96 vs. limit=6.0 2023-11-27 13:34:52,847 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3081753.3333333335, ans=10.0 2023-11-27 13:35:00,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3081820.0, ans=0.0 2023-11-27 13:35:14,017 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3081886.6666666665, ans=0.0 2023-11-27 13:35:14,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3081886.6666666665, ans=0.125 2023-11-27 13:35:15,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3081886.6666666665, ans=0.125 2023-11-27 13:35:16,880 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.312e+01 8.714e+01 9.357e+01 9.937e+01 1.176e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-27 13:35:29,103 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 462300 2023-11-27 13:35:34,983 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 5400, loss[loss=0.09634, simple_loss=0.1215, pruned_loss=0.02749, audio_tagging_loss=0.008108, over 14460.00 frames. ], tot_loss[loss=0.06736, simple_loss=0.0916, pruned_loss=0.01291, audio_tagging_loss=0.008649, over 3042040.07 frames. ], batch size: 54, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:35:42,875 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3082020.0, ans=0.125 2023-11-27 13:35:44,085 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3082020.0, ans=0.04949747468305833 2023-11-27 13:35:57,088 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3082086.6666666665, ans=0.2 2023-11-27 13:36:03,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3082153.3333333335, ans=0.125 2023-11-27 13:36:27,742 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 462350 2023-11-27 13:36:28,998 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3082286.6666666665, ans=0.2 2023-11-27 13:36:33,144 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 5450, loss[loss=0.05855, simple_loss=0.0771, pruned_loss=0.01025, audio_tagging_loss=0.009746, over 15462.00 frames. ], tot_loss[loss=0.06721, simple_loss=0.09119, pruned_loss=0.01288, audio_tagging_loss=0.008734, over 3037699.56 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:36:38,556 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3082353.3333333335, ans=0.125 2023-11-27 13:36:43,010 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3082353.3333333335, ans=0.1 2023-11-27 13:36:53,838 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3082420.0, ans=0.0 2023-11-27 13:36:56,083 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=3082486.6666666665, ans=0.02 2023-11-27 13:36:59,674 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.76 vs. limit=15.0 2023-11-27 13:37:05,399 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3082486.6666666665, ans=0.0 2023-11-27 13:37:12,765 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.418e+01 8.623e+01 9.313e+01 1.025e+02 1.327e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-27 13:37:17,361 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3082553.3333333335, ans=0.0 2023-11-27 13:37:25,523 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 462400 2023-11-27 13:37:31,090 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 5500, loss[loss=0.07118, simple_loss=0.0942, pruned_loss=0.01381, audio_tagging_loss=0.01027, over 15787.00 frames. ], tot_loss[loss=0.06737, simple_loss=0.09144, pruned_loss=0.01295, audio_tagging_loss=0.008703, over 3040945.89 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 8.0 2023-11-27 13:37:35,045 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.45 vs. limit=15.0 2023-11-27 13:37:45,925 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.01 vs. limit=22.5 2023-11-27 13:37:57,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3082820.0, ans=0.125 2023-11-27 13:37:57,601 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.90 vs. limit=15.0 2023-11-27 13:38:23,010 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 462450 2023-11-27 13:38:23,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3082953.3333333335, ans=0.0 2023-11-27 13:38:24,316 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3082953.3333333335, ans=0.125 2023-11-27 13:38:28,412 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 5550, loss[loss=0.07887, simple_loss=0.09918, pruned_loss=0.01957, audio_tagging_loss=0.009702, over 15313.00 frames. ], tot_loss[loss=0.067, simple_loss=0.09048, pruned_loss=0.01282, audio_tagging_loss=0.008942, over 3036754.25 frames. ], batch size: 60, lr: 1.74e-03, grad_scale: 8.0 2023-11-27 13:38:34,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3083020.0, ans=0.125 2023-11-27 13:38:34,561 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 13:38:34,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3083020.0, ans=0.125 2023-11-27 13:39:09,186 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.500e+01 8.682e+01 9.087e+01 1.004e+02 1.719e+02, threshold=1.817e+02, percent-clipped=0.0 2023-11-27 13:39:17,908 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.80 vs. limit=15.0 2023-11-27 13:39:21,226 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 462500 2023-11-27 13:39:27,220 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 5600, loss[loss=0.08274, simple_loss=0.1212, pruned_loss=0.01578, audio_tagging_loss=0.006353, over 16163.00 frames. ], tot_loss[loss=0.06802, simple_loss=0.09189, pruned_loss=0.01309, audio_tagging_loss=0.00899, over 3046945.23 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:39:36,558 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.25 vs. limit=15.0 2023-11-27 13:39:50,003 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3083486.6666666665, ans=0.015 2023-11-27 13:40:00,808 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.08 vs. limit=15.0 2023-11-27 13:40:04,213 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3083553.3333333335, ans=0.0 2023-11-27 13:40:12,348 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 13:40:16,536 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.17 vs. limit=6.0 2023-11-27 13:40:19,104 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 462550 2023-11-27 13:40:25,102 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 5650, loss[loss=0.05937, simple_loss=0.07665, pruned_loss=0.01259, audio_tagging_loss=0.00846, over 15254.00 frames. ], tot_loss[loss=0.06828, simple_loss=0.09245, pruned_loss=0.01311, audio_tagging_loss=0.008954, over 3052567.02 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:40:31,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3083686.6666666665, ans=0.0 2023-11-27 13:40:54,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3083820.0, ans=0.2 2023-11-27 13:41:03,745 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.81 vs. limit=10.0 2023-11-27 13:41:04,774 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.75 vs. limit=12.0 2023-11-27 13:41:05,319 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.900e+01 8.523e+01 8.985e+01 9.871e+01 1.258e+02, threshold=1.797e+02, percent-clipped=0.0 2023-11-27 13:41:16,884 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 462600 2023-11-27 13:41:17,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3083953.3333333335, ans=0.0 2023-11-27 13:41:22,700 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 5700, loss[loss=0.05776, simple_loss=0.08059, pruned_loss=0.01024, audio_tagging_loss=0.007218, over 16463.00 frames. ], tot_loss[loss=0.06808, simple_loss=0.09211, pruned_loss=0.01308, audio_tagging_loss=0.008938, over 3043951.50 frames. ], batch size: 61, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:41:26,360 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.92 vs. limit=22.5 2023-11-27 13:41:44,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3084086.6666666665, ans=0.0 2023-11-27 13:41:52,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3084153.3333333335, ans=0.125 2023-11-27 13:42:02,000 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.97 vs. limit=22.5 2023-11-27 13:42:08,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3084286.6666666665, ans=0.1 2023-11-27 13:42:14,985 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 462650 2023-11-27 13:42:21,556 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 5750, loss[loss=0.06128, simple_loss=0.07894, pruned_loss=0.01138, audio_tagging_loss=0.01043, over 16140.00 frames. ], tot_loss[loss=0.06736, simple_loss=0.09133, pruned_loss=0.01289, audio_tagging_loss=0.008806, over 3052399.37 frames. ], batch size: 61, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:42:23,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3084353.3333333335, ans=0.125 2023-11-27 13:42:44,526 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 13:42:47,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3084486.6666666665, ans=0.2 2023-11-27 13:43:01,845 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.577e+01 8.392e+01 9.189e+01 1.014e+02 1.266e+02, threshold=1.838e+02, percent-clipped=0.0 2023-11-27 13:43:06,353 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.37 vs. limit=15.0 2023-11-27 13:43:10,254 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3084620.0, ans=0.5 2023-11-27 13:43:11,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3084620.0, ans=0.125 2023-11-27 13:43:13,302 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 462700 2023-11-27 13:43:18,804 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 5800, loss[loss=0.06286, simple_loss=0.0825, pruned_loss=0.01353, audio_tagging_loss=0.008078, over 14658.00 frames. ], tot_loss[loss=0.06782, simple_loss=0.09208, pruned_loss=0.0131, audio_tagging_loss=0.008674, over 3048889.77 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 13:43:24,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3084686.6666666665, ans=0.125 2023-11-27 13:43:35,138 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3084753.3333333335, ans=0.2 2023-11-27 13:43:45,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3084820.0, ans=0.125 2023-11-27 13:44:11,190 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 462750 2023-11-27 13:44:11,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3084953.3333333335, ans=0.1 2023-11-27 13:44:16,476 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 5850, loss[loss=0.08058, simple_loss=0.1111, pruned_loss=0.01769, audio_tagging_loss=0.007351, over 15752.00 frames. ], tot_loss[loss=0.06785, simple_loss=0.0922, pruned_loss=0.01314, audio_tagging_loss=0.008609, over 3050316.05 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 13:44:22,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3085020.0, ans=0.125 2023-11-27 13:44:41,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3085153.3333333335, ans=0.0 2023-11-27 13:44:57,005 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.148e+01 8.603e+01 9.114e+01 9.888e+01 1.396e+02, threshold=1.823e+02, percent-clipped=0.0 2023-11-27 13:45:07,660 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 13:45:08,514 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 462800 2023-11-27 13:45:14,809 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 5900, loss[loss=0.05238, simple_loss=0.07099, pruned_loss=0.008567, audio_tagging_loss=0.008314, over 13982.00 frames. ], tot_loss[loss=0.06722, simple_loss=0.09142, pruned_loss=0.01296, audio_tagging_loss=0.008547, over 3047364.75 frames. ], batch size: 53, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 13:45:30,036 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3085420.0, ans=0.1 2023-11-27 13:45:47,598 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.31 vs. limit=22.5 2023-11-27 13:45:54,351 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3085553.3333333335, ans=0.125 2023-11-27 13:46:07,469 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 462850 2023-11-27 13:46:12,878 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 5950, loss[loss=0.06184, simple_loss=0.0858, pruned_loss=0.01133, audio_tagging_loss=0.00761, over 16064.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.0901, pruned_loss=0.01267, audio_tagging_loss=0.008633, over 3053068.13 frames. ], batch size: 61, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 13:46:18,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3085686.6666666665, ans=0.125 2023-11-27 13:46:53,858 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.241e+01 8.517e+01 9.163e+01 1.018e+02 1.224e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-27 13:47:04,843 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 462900 2023-11-27 13:47:10,179 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 6000, loss[loss=0.07349, simple_loss=0.09674, pruned_loss=0.01584, audio_tagging_loss=0.009282, over 14981.00 frames. ], tot_loss[loss=0.06675, simple_loss=0.09075, pruned_loss=0.01274, audio_tagging_loss=0.008633, over 3049526.85 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 13:47:10,180 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-27 13:47:23,456 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.7566, 1.9513, 3.3991, 3.4321, 3.0714, 3.3636, 3.2426, 3.4315], device='cuda:1') 2023-11-27 13:47:44,798 INFO [train_asr.py:1267] (1/4) Epoch 39, validation: loss=0.05766, simple_loss=0.05076, pruned_loss=0.005225, audio_tagging_loss=0.02706, over 4681554.00 frames. 2023-11-27 13:47:44,799 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-27 13:47:46,656 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.13 vs. limit=15.0 2023-11-27 13:48:10,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3086153.3333333335, ans=0.1 2023-11-27 13:48:27,893 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3086220.0, ans=0.125 2023-11-27 13:48:28,014 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 13:48:29,937 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 13:48:36,782 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 462950 2023-11-27 13:48:38,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3086286.6666666665, ans=0.125 2023-11-27 13:48:40,424 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 13:48:42,325 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 6050, loss[loss=0.06388, simple_loss=0.09279, pruned_loss=0.008989, audio_tagging_loss=0.008494, over 15379.00 frames. ], tot_loss[loss=0.06732, simple_loss=0.09165, pruned_loss=0.01284, audio_tagging_loss=0.00865, over 3050539.63 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 13:48:56,753 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3086420.0, ans=0.04949747468305833 2023-11-27 13:48:58,196 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.10 vs. limit=15.0 2023-11-27 13:49:06,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3086486.6666666665, ans=0.0 2023-11-27 13:49:24,233 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.244e+01 8.673e+01 9.372e+01 1.019e+02 1.327e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-27 13:49:34,257 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 463000 2023-11-27 13:49:34,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3086620.0, ans=0.0 2023-11-27 13:49:36,907 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3086620.0, ans=0.1 2023-11-27 13:49:40,139 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 6100, loss[loss=0.07216, simple_loss=0.09879, pruned_loss=0.01344, audio_tagging_loss=0.009323, over 14579.00 frames. ], tot_loss[loss=0.06747, simple_loss=0.09162, pruned_loss=0.01298, audio_tagging_loss=0.00869, over 3052592.53 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 13:49:40,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3086686.6666666665, ans=0.0 2023-11-27 13:49:43,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3086686.6666666665, ans=0.2 2023-11-27 13:49:46,656 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3086686.6666666665, ans=0.125 2023-11-27 13:50:03,234 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3086820.0, ans=0.1 2023-11-27 13:50:32,457 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 463050 2023-11-27 13:50:35,324 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3086953.3333333335, ans=0.0 2023-11-27 13:50:38,828 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 6150, loss[loss=0.07096, simple_loss=0.09988, pruned_loss=0.01266, audio_tagging_loss=0.008355, over 16606.00 frames. ], tot_loss[loss=0.06761, simple_loss=0.09183, pruned_loss=0.01294, audio_tagging_loss=0.008757, over 3048486.70 frames. ], batch size: 63, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 13:50:56,398 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 13:51:02,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3087153.3333333335, ans=0.09899494936611666 2023-11-27 13:51:08,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3087153.3333333335, ans=0.125 2023-11-27 13:51:12,720 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3087220.0, ans=0.1 2023-11-27 13:51:20,159 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.153e+01 8.580e+01 9.298e+01 1.013e+02 1.298e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-27 13:51:28,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3087286.6666666665, ans=0.0 2023-11-27 13:51:31,260 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 463100 2023-11-27 13:51:33,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3087286.6666666665, ans=0.2 2023-11-27 13:51:36,711 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 6200, loss[loss=0.06108, simple_loss=0.08179, pruned_loss=0.009991, audio_tagging_loss=0.0102, over 15628.00 frames. ], tot_loss[loss=0.06711, simple_loss=0.09099, pruned_loss=0.01279, audio_tagging_loss=0.008823, over 3055328.64 frames. ], batch size: 61, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 13:52:03,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3087486.6666666665, ans=0.1 2023-11-27 13:52:05,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3087486.6666666665, ans=0.0 2023-11-27 13:52:16,760 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3087553.3333333335, ans=0.2 2023-11-27 13:52:20,234 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3087553.3333333335, ans=0.125 2023-11-27 13:52:28,823 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 463150 2023-11-27 13:52:34,174 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 6250, loss[loss=0.0629, simple_loss=0.08925, pruned_loss=0.007949, audio_tagging_loss=0.01033, over 15143.00 frames. ], tot_loss[loss=0.06684, simple_loss=0.09068, pruned_loss=0.01267, audio_tagging_loss=0.008835, over 3057365.63 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 13:52:41,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3087686.6666666665, ans=0.125 2023-11-27 13:52:55,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3087753.3333333335, ans=0.125 2023-11-27 13:52:57,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3087820.0, ans=0.125 2023-11-27 13:53:03,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3087820.0, ans=0.125 2023-11-27 13:53:14,261 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3087886.6666666665, ans=0.125 2023-11-27 13:53:16,193 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.844e+01 8.648e+01 9.152e+01 1.003e+02 1.287e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-27 13:53:20,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3087953.3333333335, ans=0.1 2023-11-27 13:53:26,245 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 463200 2023-11-27 13:53:32,902 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 6300, loss[loss=0.05481, simple_loss=0.07452, pruned_loss=0.00831, audio_tagging_loss=0.009241, over 15547.00 frames. ], tot_loss[loss=0.06684, simple_loss=0.0906, pruned_loss=0.01267, audio_tagging_loss=0.00887, over 3054067.55 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 13:53:46,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3088086.6666666665, ans=0.0 2023-11-27 13:53:48,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3088086.6666666665, ans=0.125 2023-11-27 13:53:57,546 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 13:54:01,847 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3088153.3333333335, ans=0.1 2023-11-27 13:54:03,169 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3088153.3333333335, ans=0.125 2023-11-27 13:54:05,235 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3088153.3333333335, ans=0.1 2023-11-27 13:54:05,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3088153.3333333335, ans=0.125 2023-11-27 13:54:11,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3088220.0, ans=0.125 2023-11-27 13:54:25,982 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 463250 2023-11-27 13:54:30,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3088353.3333333335, ans=0.0 2023-11-27 13:54:31,174 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.05 vs. limit=15.0 2023-11-27 13:54:31,597 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 6350, loss[loss=0.07504, simple_loss=0.1034, pruned_loss=0.01636, audio_tagging_loss=0.006994, over 15170.00 frames. ], tot_loss[loss=0.0668, simple_loss=0.09043, pruned_loss=0.01269, audio_tagging_loss=0.00889, over 3046914.16 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 13:54:40,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3088353.3333333335, ans=0.2 2023-11-27 13:54:44,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3088420.0, ans=0.125 2023-11-27 13:54:59,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3088486.6666666665, ans=0.125 2023-11-27 13:55:05,853 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3088553.3333333335, ans=0.0 2023-11-27 13:55:13,234 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.449e+01 8.528e+01 9.081e+01 9.909e+01 1.480e+02, threshold=1.816e+02, percent-clipped=0.0 2023-11-27 13:55:23,371 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 463300 2023-11-27 13:55:27,201 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.24 vs. limit=15.0 2023-11-27 13:55:28,794 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 6400, loss[loss=0.06985, simple_loss=0.09236, pruned_loss=0.01518, audio_tagging_loss=0.008494, over 15758.00 frames. ], tot_loss[loss=0.06697, simple_loss=0.09066, pruned_loss=0.0127, audio_tagging_loss=0.008946, over 3042179.86 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 13:55:30,536 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.55 vs. limit=6.0 2023-11-27 13:55:44,394 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.19 vs. limit=10.0 2023-11-27 13:56:09,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3088886.6666666665, ans=0.95 2023-11-27 13:56:10,359 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3088886.6666666665, ans=0.0 2023-11-27 13:56:12,079 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.33 vs. limit=22.5 2023-11-27 13:56:13,676 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3088953.3333333335, ans=0.0 2023-11-27 13:56:14,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3088953.3333333335, ans=0.125 2023-11-27 13:56:17,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3088953.3333333335, ans=0.1 2023-11-27 13:56:20,295 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 463350 2023-11-27 13:56:25,827 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 6450, loss[loss=0.05129, simple_loss=0.06127, pruned_loss=0.008893, audio_tagging_loss=0.01176, over 14047.00 frames. ], tot_loss[loss=0.06682, simple_loss=0.09003, pruned_loss=0.0127, audio_tagging_loss=0.009114, over 3040410.20 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 13:56:44,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3089086.6666666665, ans=0.07 2023-11-27 13:56:50,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=3089153.3333333335, ans=22.5 2023-11-27 13:57:07,845 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.300e+01 8.614e+01 9.257e+01 9.847e+01 1.533e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-27 13:57:08,122 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3089220.0, ans=0.0 2023-11-27 13:57:12,901 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3089286.6666666665, ans=0.125 2023-11-27 13:57:19,464 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 463400 2023-11-27 13:57:23,885 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3089286.6666666665, ans=0.0 2023-11-27 13:57:25,806 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 6500, loss[loss=0.04868, simple_loss=0.06427, pruned_loss=0.006111, audio_tagging_loss=0.01043, over 15441.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.08901, pruned_loss=0.01245, audio_tagging_loss=0.009094, over 3032805.51 frames. ], batch size: 60, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 13:57:35,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3089353.3333333335, ans=0.125 2023-11-27 13:57:44,184 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.50 vs. limit=15.0 2023-11-27 13:57:45,301 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.17 vs. limit=10.0 2023-11-27 13:57:52,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3089486.6666666665, ans=0.0 2023-11-27 13:58:05,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff2.min_abs, batch_count=3089553.3333333335, ans=0.1 2023-11-27 13:58:06,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3089553.3333333335, ans=0.1 2023-11-27 13:58:06,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3089553.3333333335, ans=0.1 2023-11-27 13:58:17,203 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 463450 2023-11-27 13:58:19,724 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3089620.0, ans=0.125 2023-11-27 13:58:22,706 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 6550, loss[loss=0.07333, simple_loss=0.1027, pruned_loss=0.01383, audio_tagging_loss=0.008135, over 15759.00 frames. ], tot_loss[loss=0.066, simple_loss=0.08896, pruned_loss=0.01254, audio_tagging_loss=0.008982, over 3031136.97 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 13:58:22,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3089686.6666666665, ans=0.0 2023-11-27 13:58:43,973 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.91 vs. limit=10.0 2023-11-27 13:59:04,350 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.040e+01 8.565e+01 9.161e+01 9.963e+01 1.311e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-27 13:59:14,316 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 463500 2023-11-27 13:59:19,703 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 6600, loss[loss=0.06109, simple_loss=0.0787, pruned_loss=0.009573, audio_tagging_loss=0.01217, over 14780.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.08904, pruned_loss=0.01243, audio_tagging_loss=0.008862, over 3043749.85 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 13:59:55,297 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn1.whiten.whitening_limit, batch_count=3090220.0, ans=22.5 2023-11-27 14:00:03,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3090220.0, ans=0.125 2023-11-27 14:00:12,290 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 463550 2023-11-27 14:00:17,671 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 6650, loss[loss=0.07944, simple_loss=0.1052, pruned_loss=0.01888, audio_tagging_loss=0.007957, over 14796.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.09004, pruned_loss=0.01265, audio_tagging_loss=0.008795, over 3044292.14 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:00:44,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3090486.6666666665, ans=0.125 2023-11-27 14:00:58,785 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.998e+01 8.791e+01 9.442e+01 1.009e+02 1.378e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-27 14:01:09,414 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 463600 2023-11-27 14:01:15,147 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 6700, loss[loss=0.06841, simple_loss=0.08995, pruned_loss=0.01175, audio_tagging_loss=0.01169, over 15652.00 frames. ], tot_loss[loss=0.0668, simple_loss=0.09066, pruned_loss=0.01281, audio_tagging_loss=0.008661, over 3046512.24 frames. ], batch size: 60, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:01:29,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3090753.3333333335, ans=0.0 2023-11-27 14:01:35,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3090753.3333333335, ans=0.0 2023-11-27 14:01:40,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff2.min_abs, batch_count=3090820.0, ans=0.1 2023-11-27 14:02:06,791 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 463650 2023-11-27 14:02:09,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3090953.3333333335, ans=0.0 2023-11-27 14:02:10,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3090953.3333333335, ans=0.125 2023-11-27 14:02:12,094 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 6750, loss[loss=0.04317, simple_loss=0.05583, pruned_loss=0.005546, audio_tagging_loss=0.009706, over 15044.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.0899, pruned_loss=0.01258, audio_tagging_loss=0.008641, over 3039800.13 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:02:17,912 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3091020.0, ans=0.125 2023-11-27 14:02:35,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3091153.3333333335, ans=10.0 2023-11-27 14:02:36,905 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3091153.3333333335, ans=0.0 2023-11-27 14:02:46,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3091220.0, ans=0.2 2023-11-27 14:02:46,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3091220.0, ans=0.2 2023-11-27 14:02:53,465 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.955e+01 8.433e+01 9.032e+01 9.783e+01 1.125e+02, threshold=1.806e+02, percent-clipped=0.0 2023-11-27 14:03:01,908 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.14 vs. limit=22.5 2023-11-27 14:03:03,934 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 463700 2023-11-27 14:03:10,156 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 6800, loss[loss=0.06469, simple_loss=0.08862, pruned_loss=0.01357, audio_tagging_loss=0.006802, over 14947.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08873, pruned_loss=0.0123, audio_tagging_loss=0.008686, over 3036598.45 frames. ], batch size: 58, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:03:17,181 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.10 vs. limit=22.5 2023-11-27 14:03:18,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3091353.3333333335, ans=0.125 2023-11-27 14:03:22,499 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.26 vs. limit=15.0 2023-11-27 14:03:36,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3091486.6666666665, ans=0.125 2023-11-27 14:03:45,109 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3091553.3333333335, ans=0.1 2023-11-27 14:03:51,906 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3091553.3333333335, ans=0.2 2023-11-27 14:03:59,473 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3091620.0, ans=0.1 2023-11-27 14:04:01,349 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 463750 2023-11-27 14:04:01,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3091620.0, ans=0.0 2023-11-27 14:04:06,900 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 6850, loss[loss=0.06405, simple_loss=0.07703, pruned_loss=0.01352, audio_tagging_loss=0.01201, over 13975.00 frames. ], tot_loss[loss=0.06522, simple_loss=0.08868, pruned_loss=0.01216, audio_tagging_loss=0.008715, over 3045424.70 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:04:10,271 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.72 vs. limit=15.0 2023-11-27 14:04:12,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3091686.6666666665, ans=0.0 2023-11-27 14:04:19,931 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3091753.3333333335, ans=0.0 2023-11-27 14:04:27,536 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 14:04:28,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3091820.0, ans=0.1 2023-11-27 14:04:48,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3091886.6666666665, ans=0.1 2023-11-27 14:04:49,937 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.285e+01 8.738e+01 9.106e+01 9.965e+01 1.501e+02, threshold=1.821e+02, percent-clipped=0.0 2023-11-27 14:04:57,349 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3091953.3333333335, ans=0.125 2023-11-27 14:04:58,351 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 14:04:59,345 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 463800 2023-11-27 14:05:04,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3092020.0, ans=0.1 2023-11-27 14:05:04,731 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.55 vs. limit=22.5 2023-11-27 14:05:05,184 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 6900, loss[loss=0.07167, simple_loss=0.09878, pruned_loss=0.01584, audio_tagging_loss=0.006437, over 16275.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.08941, pruned_loss=0.01241, audio_tagging_loss=0.008704, over 3050098.79 frames. ], batch size: 58, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:05:23,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3092086.6666666665, ans=0.125 2023-11-27 14:05:25,666 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3092086.6666666665, ans=0.0 2023-11-27 14:05:34,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3092153.3333333335, ans=0.1 2023-11-27 14:05:36,391 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.17 vs. limit=15.0 2023-11-27 14:05:53,795 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 14:05:57,205 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 463850 2023-11-27 14:06:03,940 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 6950, loss[loss=0.06028, simple_loss=0.08192, pruned_loss=0.008472, audio_tagging_loss=0.01085, over 16087.00 frames. ], tot_loss[loss=0.06644, simple_loss=0.09025, pruned_loss=0.01258, audio_tagging_loss=0.008736, over 3048419.20 frames. ], batch size: 61, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:06:09,265 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.69 vs. limit=15.0 2023-11-27 14:06:17,666 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3092420.0, ans=0.1 2023-11-27 14:06:26,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3092486.6666666665, ans=0.2 2023-11-27 14:06:37,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3092553.3333333335, ans=0.05 2023-11-27 14:06:39,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3092553.3333333335, ans=0.125 2023-11-27 14:06:46,185 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.220e+01 8.400e+01 9.204e+01 9.755e+01 1.289e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-27 14:06:55,646 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 463900 2023-11-27 14:06:58,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3092620.0, ans=0.0 2023-11-27 14:07:01,084 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 7000, loss[loss=0.0768, simple_loss=0.1062, pruned_loss=0.0163, audio_tagging_loss=0.007404, over 15818.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.09002, pruned_loss=0.01238, audio_tagging_loss=0.008774, over 3053446.95 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:07:52,715 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 463950 2023-11-27 14:07:58,861 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 7050, loss[loss=0.06106, simple_loss=0.08536, pruned_loss=0.007766, audio_tagging_loss=0.01061, over 15357.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.0899, pruned_loss=0.01248, audio_tagging_loss=0.008857, over 3050829.22 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:08:07,754 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3093020.0, ans=0.125 2023-11-27 14:08:14,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3093086.6666666665, ans=0.04949747468305833 2023-11-27 14:08:25,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3093153.3333333335, ans=0.125 2023-11-27 14:08:31,279 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.29 vs. limit=15.0 2023-11-27 14:08:41,272 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.394e+01 8.528e+01 9.043e+01 9.552e+01 1.279e+02, threshold=1.809e+02, percent-clipped=0.0 2023-11-27 14:08:49,522 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.24 vs. limit=15.0 2023-11-27 14:08:50,140 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 464000 2023-11-27 14:08:58,745 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 7100, loss[loss=0.0726, simple_loss=0.09643, pruned_loss=0.01381, audio_tagging_loss=0.01058, over 15319.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.08995, pruned_loss=0.01248, audio_tagging_loss=0.008927, over 3052837.59 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:09:29,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3093486.6666666665, ans=0.125 2023-11-27 14:09:50,870 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 464050 2023-11-27 14:09:56,370 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 7150, loss[loss=0.06749, simple_loss=0.09658, pruned_loss=0.01082, audio_tagging_loss=0.008383, over 14888.00 frames. ], tot_loss[loss=0.06698, simple_loss=0.0907, pruned_loss=0.01268, audio_tagging_loss=0.008952, over 3051909.76 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:10:05,522 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 14:10:07,967 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.11 vs. limit=15.0 2023-11-27 14:10:09,628 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3093753.3333333335, ans=0.125 2023-11-27 14:10:14,784 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3093753.3333333335, ans=0.125 2023-11-27 14:10:31,704 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.81 vs. limit=6.0 2023-11-27 14:10:38,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3093886.6666666665, ans=0.125 2023-11-27 14:10:39,822 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.194e+01 8.709e+01 9.080e+01 1.002e+02 1.169e+02, threshold=1.816e+02, percent-clipped=0.0 2023-11-27 14:10:47,559 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 464100 2023-11-27 14:10:48,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3093953.3333333335, ans=0.0 2023-11-27 14:10:53,056 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 7200, loss[loss=0.05901, simple_loss=0.09022, pruned_loss=0.0076, audio_tagging_loss=0.0063, over 14298.00 frames. ], tot_loss[loss=0.06674, simple_loss=0.09006, pruned_loss=0.01265, audio_tagging_loss=0.009064, over 3048202.43 frames. ], batch size: 53, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:11:02,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3094020.0, ans=0.1 2023-11-27 14:11:14,137 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.08 vs. limit=22.5 2023-11-27 14:11:24,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3094153.3333333335, ans=0.125 2023-11-27 14:11:31,930 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.92 vs. limit=22.5 2023-11-27 14:11:45,221 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 464150 2023-11-27 14:11:46,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3094286.6666666665, ans=0.125 2023-11-27 14:11:48,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3094286.6666666665, ans=0.05 2023-11-27 14:11:50,684 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 7250, loss[loss=0.06436, simple_loss=0.08858, pruned_loss=0.01097, audio_tagging_loss=0.009103, over 15328.00 frames. ], tot_loss[loss=0.06716, simple_loss=0.0907, pruned_loss=0.01274, audio_tagging_loss=0.009069, over 3044320.60 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:12:03,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3094420.0, ans=0.125 2023-11-27 14:12:34,349 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.079e+01 8.560e+01 9.107e+01 9.786e+01 1.290e+02, threshold=1.821e+02, percent-clipped=0.0 2023-11-27 14:12:43,288 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 464200 2023-11-27 14:12:43,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3094620.0, ans=0.0 2023-11-27 14:12:49,029 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 7300, loss[loss=0.06701, simple_loss=0.09241, pruned_loss=0.01437, audio_tagging_loss=0.006432, over 14709.00 frames. ], tot_loss[loss=0.067, simple_loss=0.09079, pruned_loss=0.01264, audio_tagging_loss=0.008962, over 3039866.55 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:13:01,833 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.04 vs. limit=10.0 2023-11-27 14:13:20,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3094820.0, ans=0.1 2023-11-27 14:13:40,348 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 464250 2023-11-27 14:13:45,853 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 7350, loss[loss=0.0564, simple_loss=0.07878, pruned_loss=0.012, audio_tagging_loss=0.005018, over 15231.00 frames. ], tot_loss[loss=0.06663, simple_loss=0.0901, pruned_loss=0.01265, audio_tagging_loss=0.008925, over 3036834.27 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:13:52,749 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3095020.0, ans=0.1 2023-11-27 14:13:54,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3095020.0, ans=0.125 2023-11-27 14:14:00,088 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3095086.6666666665, ans=0.0 2023-11-27 14:14:00,592 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.13 vs. limit=22.5 2023-11-27 14:14:05,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3095086.6666666665, ans=0.0 2023-11-27 14:14:06,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3095086.6666666665, ans=0.125 2023-11-27 14:14:23,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3095220.0, ans=0.125 2023-11-27 14:14:29,092 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3095220.0, ans=0.2 2023-11-27 14:14:29,931 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.342e+01 8.691e+01 9.417e+01 9.998e+01 1.354e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-27 14:14:37,804 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 464300 2023-11-27 14:14:43,422 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.67 vs. limit=15.0 2023-11-27 14:14:43,849 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 7400, loss[loss=0.06375, simple_loss=0.08772, pruned_loss=0.01089, audio_tagging_loss=0.008994, over 15807.00 frames. ], tot_loss[loss=0.06677, simple_loss=0.09053, pruned_loss=0.01271, audio_tagging_loss=0.008802, over 3045872.71 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:14:45,506 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.59 vs. limit=22.5 2023-11-27 14:14:51,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3095353.3333333335, ans=0.125 2023-11-27 14:14:57,601 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3095420.0, ans=0.125 2023-11-27 14:15:04,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3095420.0, ans=0.125 2023-11-27 14:15:25,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3095553.3333333335, ans=0.125 2023-11-27 14:15:32,421 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.94 vs. limit=22.5 2023-11-27 14:15:36,783 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 464350 2023-11-27 14:15:42,171 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 7450, loss[loss=0.0523, simple_loss=0.05586, pruned_loss=0.008915, audio_tagging_loss=0.01545, over 14946.00 frames. ], tot_loss[loss=0.06649, simple_loss=0.08998, pruned_loss=0.01272, audio_tagging_loss=0.008782, over 3035420.60 frames. ], batch size: 58, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:15:45,720 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3095686.6666666665, ans=0.125 2023-11-27 14:16:03,593 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3095820.0, ans=0.1 2023-11-27 14:16:23,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3095886.6666666665, ans=0.0 2023-11-27 14:16:23,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3095886.6666666665, ans=0.125 2023-11-27 14:16:25,881 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.708e+01 8.639e+01 9.279e+01 9.819e+01 1.205e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-27 14:16:33,608 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 464400 2023-11-27 14:16:34,337 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.87 vs. limit=15.0 2023-11-27 14:16:36,383 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3095953.3333333335, ans=0.05 2023-11-27 14:16:39,348 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 7500, loss[loss=0.07924, simple_loss=0.1207, pruned_loss=0.01152, audio_tagging_loss=0.007378, over 13828.00 frames. ], tot_loss[loss=0.06649, simple_loss=0.09043, pruned_loss=0.01259, audio_tagging_loss=0.008683, over 3034191.48 frames. ], batch size: 52, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:16:46,241 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3096020.0, ans=0.125 2023-11-27 14:16:48,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3096020.0, ans=0.1 2023-11-27 14:16:51,116 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.65 vs. limit=15.0 2023-11-27 14:16:57,388 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.69 vs. limit=15.0 2023-11-27 14:16:58,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3096086.6666666665, ans=0.2 2023-11-27 14:17:14,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3096220.0, ans=0.1 2023-11-27 14:17:31,936 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 464450 2023-11-27 14:17:37,446 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 7550, loss[loss=0.06986, simple_loss=0.09282, pruned_loss=0.01768, audio_tagging_loss=0.005773, over 14926.00 frames. ], tot_loss[loss=0.06706, simple_loss=0.09096, pruned_loss=0.0129, audio_tagging_loss=0.008679, over 3028633.97 frames. ], batch size: 53, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:17:39,900 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3096353.3333333335, ans=0.125 2023-11-27 14:17:41,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3096353.3333333335, ans=0.0 2023-11-27 14:17:49,161 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3096353.3333333335, ans=0.0 2023-11-27 14:18:22,771 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.479e+01 8.787e+01 9.439e+01 1.010e+02 1.313e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-27 14:18:29,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3096620.0, ans=0.125 2023-11-27 14:18:31,286 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 464500 2023-11-27 14:18:31,393 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3096620.0, ans=0.0 2023-11-27 14:18:37,278 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 7600, loss[loss=0.06197, simple_loss=0.07299, pruned_loss=0.01423, audio_tagging_loss=0.01124, over 16002.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.0895, pruned_loss=0.01271, audio_tagging_loss=0.008692, over 3032423.05 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:18:51,746 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3096753.3333333335, ans=0.125 2023-11-27 14:18:59,693 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3096820.0, ans=0.09899494936611666 2023-11-27 14:19:06,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3096820.0, ans=0.125 2023-11-27 14:19:13,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3096886.6666666665, ans=0.0 2023-11-27 14:19:20,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3096886.6666666665, ans=0.2 2023-11-27 14:19:28,830 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 464550 2023-11-27 14:19:28,916 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3096953.3333333335, ans=0.2 2023-11-27 14:19:34,219 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 7650, loss[loss=0.06522, simple_loss=0.08324, pruned_loss=0.01271, audio_tagging_loss=0.0109, over 15125.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08852, pruned_loss=0.01246, audio_tagging_loss=0.008673, over 3031412.17 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:19:35,595 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3097020.0, ans=0.125 2023-11-27 14:19:38,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3097020.0, ans=0.125 2023-11-27 14:19:45,864 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3097086.6666666665, ans=0.2 2023-11-27 14:19:56,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=3097153.3333333335, ans=0.05 2023-11-27 14:20:08,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3097220.0, ans=0.0 2023-11-27 14:20:17,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3097220.0, ans=0.0 2023-11-27 14:20:18,857 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.232e+01 8.470e+01 8.990e+01 9.726e+01 1.372e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-27 14:20:22,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3097286.6666666665, ans=0.1 2023-11-27 14:20:25,482 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 464600 2023-11-27 14:20:25,901 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.14 vs. limit=15.0 2023-11-27 14:20:31,205 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 7700, loss[loss=0.08521, simple_loss=0.1148, pruned_loss=0.02207, audio_tagging_loss=0.005738, over 15169.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.0887, pruned_loss=0.01262, audio_tagging_loss=0.008639, over 3036236.80 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:20:48,329 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.30 vs. limit=15.0 2023-11-27 14:20:54,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3097486.6666666665, ans=0.2 2023-11-27 14:20:55,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3097486.6666666665, ans=0.2 2023-11-27 14:20:57,231 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.36 vs. limit=22.5 2023-11-27 14:20:58,088 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3097486.6666666665, ans=0.125 2023-11-27 14:21:03,358 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3097486.6666666665, ans=0.1 2023-11-27 14:21:12,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3097553.3333333335, ans=0.0 2023-11-27 14:21:18,693 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3097620.0, ans=0.0 2023-11-27 14:21:23,390 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 464650 2023-11-27 14:21:30,606 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 7750, loss[loss=0.04823, simple_loss=0.06154, pruned_loss=0.007384, audio_tagging_loss=0.01008, over 16101.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.0878, pruned_loss=0.01255, audio_tagging_loss=0.008806, over 3034646.54 frames. ], batch size: 63, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:21:31,305 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.66 vs. limit=15.0 2023-11-27 14:21:33,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3097686.6666666665, ans=0.125 2023-11-27 14:21:45,301 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.72 vs. limit=15.0 2023-11-27 14:22:02,401 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3097886.6666666665, ans=0.125 2023-11-27 14:22:03,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3097886.6666666665, ans=0.1 2023-11-27 14:22:06,422 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.39 vs. limit=10.0 2023-11-27 14:22:07,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3097886.6666666665, ans=0.1 2023-11-27 14:22:15,498 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.225e+01 8.645e+01 9.369e+01 1.003e+02 1.399e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-27 14:22:22,177 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 464700 2023-11-27 14:22:27,490 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 7800, loss[loss=0.07277, simple_loss=0.102, pruned_loss=0.01289, audio_tagging_loss=0.008877, over 14846.00 frames. ], tot_loss[loss=0.0658, simple_loss=0.08906, pruned_loss=0.01258, audio_tagging_loss=0.008689, over 3039368.43 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:22:34,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3098020.0, ans=0.0 2023-11-27 14:22:40,955 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 14:22:48,998 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3098153.3333333335, ans=0.0 2023-11-27 14:22:49,005 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 14:22:58,103 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.41 vs. limit=15.0 2023-11-27 14:22:58,237 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.39 vs. limit=15.0 2023-11-27 14:23:15,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3098286.6666666665, ans=0.0 2023-11-27 14:23:18,442 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3098286.6666666665, ans=0.125 2023-11-27 14:23:19,342 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 464750 2023-11-27 14:23:24,835 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 7850, loss[loss=0.06976, simple_loss=0.09158, pruned_loss=0.01352, audio_tagging_loss=0.01045, over 15918.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.08929, pruned_loss=0.01254, audio_tagging_loss=0.008731, over 3039995.23 frames. ], batch size: 61, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:23:33,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3098353.3333333335, ans=0.125 2023-11-27 14:23:36,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3098420.0, ans=0.2 2023-11-27 14:24:10,079 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.081e+01 8.600e+01 9.119e+01 9.772e+01 1.362e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-27 14:24:10,612 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.72 vs. limit=22.5 2023-11-27 14:24:12,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3098620.0, ans=0.2 2023-11-27 14:24:17,254 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 464800 2023-11-27 14:24:24,268 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 7900, loss[loss=0.07512, simple_loss=0.1025, pruned_loss=0.01619, audio_tagging_loss=0.007665, over 15311.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.08956, pruned_loss=0.01245, audio_tagging_loss=0.008884, over 3050952.14 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:24:25,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3098686.6666666665, ans=0.125 2023-11-27 14:24:44,204 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3098753.3333333335, ans=0.125 2023-11-27 14:24:49,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3098820.0, ans=0.125 2023-11-27 14:24:50,646 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3098820.0, ans=0.125 2023-11-27 14:25:16,369 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 464850 2023-11-27 14:25:16,518 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3098953.3333333335, ans=0.025 2023-11-27 14:25:22,380 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 7950, loss[loss=0.05651, simple_loss=0.0741, pruned_loss=0.009114, audio_tagging_loss=0.01035, over 14494.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.08931, pruned_loss=0.01225, audio_tagging_loss=0.009015, over 3052167.75 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:25:25,856 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 14:25:38,879 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 14:25:51,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3099153.3333333335, ans=0.09899494936611666 2023-11-27 14:25:52,931 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3099153.3333333335, ans=0.1 2023-11-27 14:25:54,495 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.14 vs. limit=22.5 2023-11-27 14:26:00,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3099220.0, ans=0.125 2023-11-27 14:26:07,810 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.468e+01 8.621e+01 8.980e+01 9.722e+01 1.502e+02, threshold=1.796e+02, percent-clipped=0.0 2023-11-27 14:26:14,617 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 464900 2023-11-27 14:26:20,178 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 8000, loss[loss=0.05997, simple_loss=0.08041, pruned_loss=0.0113, audio_tagging_loss=0.008469, over 15015.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.0899, pruned_loss=0.01238, audio_tagging_loss=0.009071, over 3046749.54 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:26:37,562 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3099420.0, ans=0.1 2023-11-27 14:26:38,778 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.68 vs. limit=15.0 2023-11-27 14:27:12,123 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 464950 2023-11-27 14:27:18,048 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 8050, loss[loss=0.08235, simple_loss=0.1162, pruned_loss=0.01599, audio_tagging_loss=0.008285, over 14959.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.08958, pruned_loss=0.01233, audio_tagging_loss=0.009086, over 3035444.79 frames. ], batch size: 54, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:27:37,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3099753.3333333335, ans=0.0 2023-11-27 14:28:00,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3099886.6666666665, ans=0.0 2023-11-27 14:28:04,068 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.865e+01 8.542e+01 9.133e+01 9.654e+01 1.190e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-27 14:28:04,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3099953.3333333335, ans=0.1 2023-11-27 14:28:08,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3099953.3333333335, ans=0.2 2023-11-27 14:28:10,993 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.57 vs. limit=15.0 2023-11-27 14:28:11,418 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 465000 2023-11-27 14:28:17,215 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 8100, loss[loss=0.06759, simple_loss=0.09202, pruned_loss=0.01561, audio_tagging_loss=0.005973, over 14895.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.08946, pruned_loss=0.01248, audio_tagging_loss=0.008987, over 3032086.78 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:28:17,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3100020.0, ans=0.0 2023-11-27 14:28:27,576 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.89 vs. limit=22.5 2023-11-27 14:29:09,989 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 465050 2023-11-27 14:29:15,407 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 8150, loss[loss=0.08771, simple_loss=0.1137, pruned_loss=0.02037, audio_tagging_loss=0.01047, over 15171.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.09003, pruned_loss=0.01268, audio_tagging_loss=0.008883, over 3036996.95 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:29:17,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3100353.3333333335, ans=0.125 2023-11-27 14:29:24,268 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 14:29:24,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3100353.3333333335, ans=0.125 2023-11-27 14:29:26,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3100420.0, ans=0.125 2023-11-27 14:29:34,028 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3100420.0, ans=0.125 2023-11-27 14:29:45,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3100486.6666666665, ans=0.0 2023-11-27 14:29:54,183 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.25 vs. limit=15.0 2023-11-27 14:29:54,445 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.74 vs. limit=15.0 2023-11-27 14:30:01,438 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.581e+01 8.694e+01 9.359e+01 9.958e+01 1.190e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-27 14:30:02,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3100620.0, ans=0.1 2023-11-27 14:30:07,033 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 465100 2023-11-27 14:30:12,882 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 8200, loss[loss=0.0708, simple_loss=0.09265, pruned_loss=0.01521, audio_tagging_loss=0.009269, over 15172.00 frames. ], tot_loss[loss=0.06714, simple_loss=0.09119, pruned_loss=0.01279, audio_tagging_loss=0.008754, over 3043186.32 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:30:14,540 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.10 vs. limit=6.0 2023-11-27 14:30:17,819 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 14:30:23,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3100686.6666666665, ans=0.0 2023-11-27 14:30:35,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3100820.0, ans=0.125 2023-11-27 14:30:37,353 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3100820.0, ans=0.125 2023-11-27 14:30:42,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3100820.0, ans=0.0 2023-11-27 14:30:53,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3100886.6666666665, ans=0.125 2023-11-27 14:30:59,282 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3100953.3333333335, ans=0.1 2023-11-27 14:31:05,968 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 465150 2023-11-27 14:31:11,295 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 8250, loss[loss=0.06965, simple_loss=0.09247, pruned_loss=0.01563, audio_tagging_loss=0.007787, over 15478.00 frames. ], tot_loss[loss=0.06672, simple_loss=0.09044, pruned_loss=0.01277, audio_tagging_loss=0.008728, over 3039129.67 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:31:30,901 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3101086.6666666665, ans=0.125 2023-11-27 14:31:57,583 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.825e+01 8.477e+01 9.033e+01 1.006e+02 1.389e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-27 14:32:03,120 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 465200 2023-11-27 14:32:08,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3101353.3333333335, ans=0.05 2023-11-27 14:32:09,415 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 8300, loss[loss=0.07074, simple_loss=0.09829, pruned_loss=0.01487, audio_tagging_loss=0.006728, over 15025.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.08953, pruned_loss=0.01267, audio_tagging_loss=0.008767, over 3039463.98 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:32:11,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3101353.3333333335, ans=0.1 2023-11-27 14:32:17,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3101353.3333333335, ans=0.125 2023-11-27 14:32:41,537 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.45 vs. limit=15.0 2023-11-27 14:32:44,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3101553.3333333335, ans=0.1 2023-11-27 14:33:01,355 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 465250 2023-11-27 14:33:02,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3101620.0, ans=0.09899494936611666 2023-11-27 14:33:06,775 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 8350, loss[loss=0.08705, simple_loss=0.1254, pruned_loss=0.01616, audio_tagging_loss=0.008201, over 15277.00 frames. ], tot_loss[loss=0.06668, simple_loss=0.09032, pruned_loss=0.01282, audio_tagging_loss=0.008709, over 3045649.04 frames. ], batch size: 54, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:33:10,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3101686.6666666665, ans=0.0 2023-11-27 14:33:13,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3101686.6666666665, ans=0.0 2023-11-27 14:33:53,575 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.236e+01 8.419e+01 8.984e+01 9.870e+01 1.325e+02, threshold=1.797e+02, percent-clipped=0.0 2023-11-27 14:33:59,235 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.94 vs. limit=22.5 2023-11-27 14:33:59,682 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 465300 2023-11-27 14:34:02,037 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.72 vs. limit=22.5 2023-11-27 14:34:05,816 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 8400, loss[loss=0.05587, simple_loss=0.08581, pruned_loss=0.00712, audio_tagging_loss=0.005848, over 14842.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.09069, pruned_loss=0.01263, audio_tagging_loss=0.008624, over 3054279.77 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:34:24,692 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3102086.6666666665, ans=0.0 2023-11-27 14:34:27,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3102153.3333333335, ans=0.0 2023-11-27 14:34:28,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3102153.3333333335, ans=0.0 2023-11-27 14:34:33,248 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3102153.3333333335, ans=0.0 2023-11-27 14:34:35,277 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 14:34:52,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3102286.6666666665, ans=0.125 2023-11-27 14:34:57,663 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 465350 2023-11-27 14:34:59,360 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.23 vs. limit=15.0 2023-11-27 14:35:03,066 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 8450, loss[loss=0.05972, simple_loss=0.07648, pruned_loss=0.0107, audio_tagging_loss=0.01077, over 15979.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.09044, pruned_loss=0.01264, audio_tagging_loss=0.008634, over 3049665.01 frames. ], batch size: 60, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:35:15,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3102420.0, ans=0.07 2023-11-27 14:35:18,853 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.53 vs. limit=15.0 2023-11-27 14:35:25,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3102486.6666666665, ans=0.04949747468305833 2023-11-27 14:35:28,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3102486.6666666665, ans=0.2 2023-11-27 14:35:49,748 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.831e+01 8.838e+01 9.408e+01 1.009e+02 1.207e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-27 14:35:49,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3102620.0, ans=0.0 2023-11-27 14:35:55,438 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 465400 2023-11-27 14:36:01,677 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.19 vs. limit=15.0 2023-11-27 14:36:02,001 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 8500, loss[loss=0.06095, simple_loss=0.07957, pruned_loss=0.01166, audio_tagging_loss=0.009501, over 14946.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.09028, pruned_loss=0.01255, audio_tagging_loss=0.00866, over 3053768.39 frames. ], batch size: 58, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:36:03,387 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3102686.6666666665, ans=0.05 2023-11-27 14:36:13,934 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.33 vs. limit=15.0 2023-11-27 14:36:14,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3102753.3333333335, ans=0.035 2023-11-27 14:36:18,144 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.81 vs. limit=6.0 2023-11-27 14:36:26,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3102820.0, ans=0.1 2023-11-27 14:36:27,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3102820.0, ans=0.125 2023-11-27 14:36:35,904 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.21 vs. limit=22.5 2023-11-27 14:36:44,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3102886.6666666665, ans=0.125 2023-11-27 14:36:53,811 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 465450 2023-11-27 14:37:00,394 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 8550, loss[loss=0.06511, simple_loss=0.08936, pruned_loss=0.01227, audio_tagging_loss=0.008156, over 16169.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.09055, pruned_loss=0.01262, audio_tagging_loss=0.008686, over 3049551.88 frames. ], batch size: 61, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:37:10,658 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3103086.6666666665, ans=0.0 2023-11-27 14:37:28,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3103153.3333333335, ans=0.0 2023-11-27 14:37:39,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3103220.0, ans=0.1 2023-11-27 14:37:43,525 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3103220.0, ans=0.0 2023-11-27 14:37:44,874 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.21 vs. limit=15.0 2023-11-27 14:37:47,512 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.420e+01 8.683e+01 9.146e+01 9.913e+01 1.274e+02, threshold=1.829e+02, percent-clipped=0.0 2023-11-27 14:37:47,788 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3103286.6666666665, ans=0.125 2023-11-27 14:37:52,115 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 465500 2023-11-27 14:37:57,534 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 8600, loss[loss=0.05312, simple_loss=0.07111, pruned_loss=0.007927, audio_tagging_loss=0.009635, over 15197.00 frames. ], tot_loss[loss=0.06684, simple_loss=0.09094, pruned_loss=0.01262, audio_tagging_loss=0.008754, over 3048366.25 frames. ], batch size: 58, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:38:26,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3103486.6666666665, ans=0.125 2023-11-27 14:38:49,655 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 465550 2023-11-27 14:38:54,121 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3103686.6666666665, ans=0.125 2023-11-27 14:38:55,048 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 8650, loss[loss=0.08257, simple_loss=0.1164, pruned_loss=0.01611, audio_tagging_loss=0.008259, over 15075.00 frames. ], tot_loss[loss=0.06711, simple_loss=0.09126, pruned_loss=0.01271, audio_tagging_loss=0.00877, over 3050348.70 frames. ], batch size: 54, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:38:59,054 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.53 vs. limit=22.5 2023-11-27 14:38:59,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3103686.6666666665, ans=0.125 2023-11-27 14:39:03,604 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.05 vs. limit=15.0 2023-11-27 14:39:40,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3103953.3333333335, ans=0.0 2023-11-27 14:39:42,544 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.197e+01 8.459e+01 9.176e+01 1.006e+02 1.194e+02, threshold=1.835e+02, percent-clipped=0.0 2023-11-27 14:39:48,291 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 465600 2023-11-27 14:39:55,216 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 8700, loss[loss=0.07687, simple_loss=0.1048, pruned_loss=0.01524, audio_tagging_loss=0.009243, over 14844.00 frames. ], tot_loss[loss=0.06672, simple_loss=0.09044, pruned_loss=0.01255, audio_tagging_loss=0.008947, over 3053551.49 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:39:55,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3104020.0, ans=0.1 2023-11-27 14:40:05,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3104086.6666666665, ans=0.0 2023-11-27 14:40:10,123 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.93 vs. limit=12.0 2023-11-27 14:40:11,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3104086.6666666665, ans=0.125 2023-11-27 14:40:12,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3104086.6666666665, ans=0.1 2023-11-27 14:40:21,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3104153.3333333335, ans=0.125 2023-11-27 14:40:46,557 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 465650 2023-11-27 14:40:51,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3104353.3333333335, ans=0.1 2023-11-27 14:40:51,979 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 8750, loss[loss=0.07433, simple_loss=0.1051, pruned_loss=0.01523, audio_tagging_loss=0.006529, over 14881.00 frames. ], tot_loss[loss=0.06742, simple_loss=0.09155, pruned_loss=0.01276, audio_tagging_loss=0.00889, over 3051200.30 frames. ], batch size: 54, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:40:56,782 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3104353.3333333335, ans=0.5 2023-11-27 14:40:59,898 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=3104353.3333333335, ans=0.5 2023-11-27 14:41:02,121 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3104420.0, ans=0.125 2023-11-27 14:41:05,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3104420.0, ans=0.0 2023-11-27 14:41:39,400 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.863e+01 8.832e+01 9.228e+01 1.004e+02 1.241e+02, threshold=1.846e+02, percent-clipped=0.0 2023-11-27 14:41:43,943 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 465700 2023-11-27 14:41:46,259 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3104620.0, ans=0.2 2023-11-27 14:41:48,581 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3104686.6666666665, ans=0.125 2023-11-27 14:41:49,419 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 8800, loss[loss=0.08415, simple_loss=0.123, pruned_loss=0.01722, audio_tagging_loss=0.005444, over 15731.00 frames. ], tot_loss[loss=0.06708, simple_loss=0.09106, pruned_loss=0.0126, audio_tagging_loss=0.008949, over 3048522.43 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:42:15,427 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.20 vs. limit=15.0 2023-11-27 14:42:24,439 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.51 vs. limit=15.0 2023-11-27 14:42:41,326 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 465750 2023-11-27 14:42:47,787 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 8850, loss[loss=0.07813, simple_loss=0.1069, pruned_loss=0.01598, audio_tagging_loss=0.008705, over 15275.00 frames. ], tot_loss[loss=0.06747, simple_loss=0.09151, pruned_loss=0.01272, audio_tagging_loss=0.008995, over 3043503.32 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:42:52,792 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.99 vs. limit=15.0 2023-11-27 14:42:53,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3105020.0, ans=10.0 2023-11-27 14:42:59,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3105086.6666666665, ans=0.125 2023-11-27 14:43:03,126 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 14:43:07,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3105086.6666666665, ans=0.125 2023-11-27 14:43:17,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3105153.3333333335, ans=0.2 2023-11-27 14:43:28,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3105220.0, ans=0.125 2023-11-27 14:43:35,672 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.380e+01 8.774e+01 9.216e+01 1.007e+02 1.244e+02, threshold=1.843e+02, percent-clipped=0.0 2023-11-27 14:43:40,114 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 465800 2023-11-27 14:43:41,398 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3105286.6666666665, ans=0.125 2023-11-27 14:43:44,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3105353.3333333335, ans=0.125 2023-11-27 14:43:45,768 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 8900, loss[loss=0.07932, simple_loss=0.1124, pruned_loss=0.01564, audio_tagging_loss=0.007503, over 16605.00 frames. ], tot_loss[loss=0.06778, simple_loss=0.09214, pruned_loss=0.01282, audio_tagging_loss=0.008887, over 3048855.56 frames. ], batch size: 62, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:43:58,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3105420.0, ans=0.0 2023-11-27 14:44:08,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3105486.6666666665, ans=0.125 2023-11-27 14:44:20,925 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.98 vs. limit=22.5 2023-11-27 14:44:21,731 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3105553.3333333335, ans=0.2 2023-11-27 14:44:36,800 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 465850 2023-11-27 14:44:42,243 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 8950, loss[loss=0.07367, simple_loss=0.09943, pruned_loss=0.01353, audio_tagging_loss=0.01043, over 15763.00 frames. ], tot_loss[loss=0.06752, simple_loss=0.0919, pruned_loss=0.01282, audio_tagging_loss=0.008746, over 3049834.38 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:44:42,399 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3105686.6666666665, ans=0.125 2023-11-27 14:44:52,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3105753.3333333335, ans=0.125 2023-11-27 14:45:20,895 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3105886.6666666665, ans=0.1 2023-11-27 14:45:29,489 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.308e+01 8.633e+01 9.411e+01 1.036e+02 1.341e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-27 14:45:34,030 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 465900 2023-11-27 14:45:39,980 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 9000, loss[loss=0.08345, simple_loss=0.1179, pruned_loss=0.01578, audio_tagging_loss=0.008724, over 15599.00 frames. ], tot_loss[loss=0.0675, simple_loss=0.0919, pruned_loss=0.01286, audio_tagging_loss=0.008693, over 3043703.98 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:45:39,980 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-27 14:46:15,058 INFO [train_asr.py:1267] (1/4) Epoch 39, validation: loss=0.05878, simple_loss=0.0507, pruned_loss=0.005237, audio_tagging_loss=0.02819, over 4681554.00 frames. 2023-11-27 14:46:15,059 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-27 14:46:22,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3106020.0, ans=0.0 2023-11-27 14:46:25,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3106086.6666666665, ans=0.125 2023-11-27 14:46:58,637 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.35 vs. limit=22.5 2023-11-27 14:46:59,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3106220.0, ans=0.2 2023-11-27 14:47:06,652 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 465950 2023-11-27 14:47:11,967 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 9050, loss[loss=0.06263, simple_loss=0.07621, pruned_loss=0.01363, audio_tagging_loss=0.01089, over 14918.00 frames. ], tot_loss[loss=0.06705, simple_loss=0.09107, pruned_loss=0.01281, audio_tagging_loss=0.008695, over 3048807.07 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:47:38,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3106486.6666666665, ans=0.125 2023-11-27 14:47:59,629 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.081e+01 8.800e+01 9.356e+01 9.893e+01 1.212e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-27 14:48:04,100 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 466000 2023-11-27 14:48:07,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3106620.0, ans=0.2 2023-11-27 14:48:10,424 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 9100, loss[loss=0.05787, simple_loss=0.08106, pruned_loss=0.01029, audio_tagging_loss=0.007047, over 15805.00 frames. ], tot_loss[loss=0.067, simple_loss=0.09101, pruned_loss=0.01287, audio_tagging_loss=0.008621, over 3055291.98 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:48:24,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3106753.3333333335, ans=0.0 2023-11-27 14:48:31,556 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.70 vs. limit=15.0 2023-11-27 14:48:38,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3106820.0, ans=0.125 2023-11-27 14:48:51,007 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=3106886.6666666665, ans=15.0 2023-11-27 14:49:03,722 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 466050 2023-11-27 14:49:09,111 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 9150, loss[loss=0.06664, simple_loss=0.08826, pruned_loss=0.01249, audio_tagging_loss=0.01002, over 15262.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.09091, pruned_loss=0.01282, audio_tagging_loss=0.008586, over 3051894.85 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:49:14,735 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.75 vs. limit=12.0 2023-11-27 14:49:15,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3107020.0, ans=0.0 2023-11-27 14:49:34,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3107153.3333333335, ans=0.125 2023-11-27 14:49:44,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3107220.0, ans=0.125 2023-11-27 14:49:57,781 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.401e+01 8.571e+01 9.032e+01 9.849e+01 1.366e+02, threshold=1.806e+02, percent-clipped=0.0 2023-11-27 14:50:01,200 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 466100 2023-11-27 14:50:06,684 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 9200, loss[loss=0.06612, simple_loss=0.09288, pruned_loss=0.009896, audio_tagging_loss=0.009789, over 16061.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.09086, pruned_loss=0.01272, audio_tagging_loss=0.008512, over 3058468.16 frames. ], batch size: 62, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:50:10,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3107353.3333333335, ans=0.125 2023-11-27 14:50:17,075 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.97 vs. limit=15.0 2023-11-27 14:50:22,096 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.79 vs. limit=6.0 2023-11-27 14:50:31,827 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3107486.6666666665, ans=0.125 2023-11-27 14:50:32,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3107486.6666666665, ans=0.125 2023-11-27 14:50:46,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3107553.3333333335, ans=0.125 2023-11-27 14:50:54,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3107620.0, ans=0.1 2023-11-27 14:50:58,616 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 466150 2023-11-27 14:51:00,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3107620.0, ans=0.125 2023-11-27 14:51:04,612 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 9250, loss[loss=0.08209, simple_loss=0.1143, pruned_loss=0.01797, audio_tagging_loss=0.006968, over 15822.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.09079, pruned_loss=0.01262, audio_tagging_loss=0.008495, over 3060774.70 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:51:35,613 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.57 vs. limit=12.0 2023-11-27 14:51:37,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3107820.0, ans=0.125 2023-11-27 14:51:38,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3107886.6666666665, ans=0.0 2023-11-27 14:51:54,767 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.58 vs. limit=15.0 2023-11-27 14:51:55,290 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.572e+01 8.711e+01 9.233e+01 9.983e+01 1.314e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-27 14:51:57,584 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 466200 2023-11-27 14:52:03,311 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 9300, loss[loss=0.0828, simple_loss=0.1171, pruned_loss=0.01689, audio_tagging_loss=0.007348, over 14925.00 frames. ], tot_loss[loss=0.06672, simple_loss=0.09102, pruned_loss=0.01271, audio_tagging_loss=0.008496, over 3061539.22 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:52:06,929 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3108020.0, ans=0.0 2023-11-27 14:52:12,181 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.15 vs. limit=15.0 2023-11-27 14:52:19,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3108086.6666666665, ans=0.1 2023-11-27 14:52:20,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3108086.6666666665, ans=0.125 2023-11-27 14:52:37,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3108220.0, ans=0.125 2023-11-27 14:52:54,204 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3108286.6666666665, ans=0.0 2023-11-27 14:52:54,980 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 466250 2023-11-27 14:53:00,966 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 9350, loss[loss=0.0617, simple_loss=0.08302, pruned_loss=0.01153, audio_tagging_loss=0.008655, over 14712.00 frames. ], tot_loss[loss=0.06663, simple_loss=0.09093, pruned_loss=0.01262, audio_tagging_loss=0.008546, over 3063314.83 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:53:04,583 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 14:53:06,710 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3108353.3333333335, ans=0.04949747468305833 2023-11-27 14:53:22,361 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3108420.0, ans=0.125 2023-11-27 14:53:35,977 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.59 vs. limit=15.0 2023-11-27 14:53:45,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3108620.0, ans=0.0 2023-11-27 14:53:49,983 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.258e+01 8.694e+01 9.307e+01 9.983e+01 1.185e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-27 14:53:52,264 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 466300 2023-11-27 14:53:53,593 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3108620.0, ans=0.5 2023-11-27 14:53:58,193 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 9400, loss[loss=0.07034, simple_loss=0.08733, pruned_loss=0.01859, audio_tagging_loss=0.00808, over 15992.00 frames. ], tot_loss[loss=0.0669, simple_loss=0.09086, pruned_loss=0.01278, audio_tagging_loss=0.008687, over 3055968.22 frames. ], batch size: 60, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:54:05,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3108686.6666666665, ans=0.0 2023-11-27 14:54:51,344 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 466350 2023-11-27 14:54:53,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3108953.3333333335, ans=0.125 2023-11-27 14:54:56,784 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 9450, loss[loss=0.06735, simple_loss=0.09114, pruned_loss=0.01165, audio_tagging_loss=0.01013, over 14666.00 frames. ], tot_loss[loss=0.06711, simple_loss=0.09119, pruned_loss=0.01268, audio_tagging_loss=0.008839, over 3052082.30 frames. ], batch size: 53, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:55:00,143 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 14:55:08,329 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.86 vs. limit=10.0 2023-11-27 14:55:31,513 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3109220.0, ans=0.2 2023-11-27 14:55:43,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3109286.6666666665, ans=0.125 2023-11-27 14:55:46,757 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.487e+01 8.803e+01 9.221e+01 9.903e+01 1.293e+02, threshold=1.844e+02, percent-clipped=0.0 2023-11-27 14:55:49,046 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 466400 2023-11-27 14:55:49,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3109286.6666666665, ans=0.09899494936611666 2023-11-27 14:55:54,762 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 9500, loss[loss=0.0679, simple_loss=0.08758, pruned_loss=0.01544, audio_tagging_loss=0.008675, over 15424.00 frames. ], tot_loss[loss=0.06712, simple_loss=0.09089, pruned_loss=0.01274, audio_tagging_loss=0.008937, over 3042500.19 frames. ], batch size: 58, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:56:05,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3109420.0, ans=0.0 2023-11-27 14:56:07,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3109420.0, ans=0.125 2023-11-27 14:56:08,071 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.49 vs. limit=15.0 2023-11-27 14:56:29,229 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.62 vs. limit=22.5 2023-11-27 14:56:34,592 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3109553.3333333335, ans=0.0 2023-11-27 14:56:44,170 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.24 vs. limit=10.0 2023-11-27 14:56:47,023 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 466450 2023-11-27 14:56:52,516 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 9550, loss[loss=0.07772, simple_loss=0.1029, pruned_loss=0.01429, audio_tagging_loss=0.01198, over 15513.00 frames. ], tot_loss[loss=0.06711, simple_loss=0.09097, pruned_loss=0.01263, audio_tagging_loss=0.008999, over 3038711.59 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:57:04,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3109753.3333333335, ans=0.125 2023-11-27 14:57:05,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3109753.3333333335, ans=0.125 2023-11-27 14:57:11,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3109753.3333333335, ans=0.125 2023-11-27 14:57:27,612 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.64 vs. limit=15.0 2023-11-27 14:57:40,576 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3109953.3333333335, ans=0.0 2023-11-27 14:57:42,556 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.757e+01 8.709e+01 9.251e+01 1.020e+02 1.249e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-27 14:57:45,240 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 466500 2023-11-27 14:57:51,337 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 9600, loss[loss=0.0784, simple_loss=0.1102, pruned_loss=0.0156, audio_tagging_loss=0.007718, over 15650.00 frames. ], tot_loss[loss=0.06701, simple_loss=0.09052, pruned_loss=0.01261, audio_tagging_loss=0.009139, over 3045342.07 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:58:18,316 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3110153.3333333335, ans=0.125 2023-11-27 14:58:35,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3110220.0, ans=0.1 2023-11-27 14:58:36,541 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3110286.6666666665, ans=0.125 2023-11-27 14:58:42,784 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 466550 2023-11-27 14:58:48,147 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 9650, loss[loss=0.07648, simple_loss=0.1023, pruned_loss=0.0192, audio_tagging_loss=0.00612, over 15716.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.08954, pruned_loss=0.01262, audio_tagging_loss=0.009154, over 3040966.93 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:58:50,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3110353.3333333335, ans=0.0 2023-11-27 14:58:52,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3110353.3333333335, ans=0.125 2023-11-27 14:59:37,455 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.159e+01 8.987e+01 9.663e+01 1.056e+02 1.477e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-27 14:59:39,689 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 466600 2023-11-27 14:59:46,072 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 9700, loss[loss=0.06686, simple_loss=0.09675, pruned_loss=0.0115, audio_tagging_loss=0.006986, over 15122.00 frames. ], tot_loss[loss=0.06653, simple_loss=0.08973, pruned_loss=0.01263, audio_tagging_loss=0.009034, over 3048129.00 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:59:52,553 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.87 vs. limit=15.0 2023-11-27 15:00:14,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3110820.0, ans=0.2 2023-11-27 15:00:15,638 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.85 vs. limit=15.0 2023-11-27 15:00:24,330 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.05 vs. limit=15.0 2023-11-27 15:00:38,196 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 466650 2023-11-27 15:00:39,961 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 15:00:44,712 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 9750, loss[loss=0.07588, simple_loss=0.09725, pruned_loss=0.01755, audio_tagging_loss=0.009706, over 15329.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.08991, pruned_loss=0.01256, audio_tagging_loss=0.008891, over 3048884.50 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 15:00:50,377 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3111020.0, ans=0.1 2023-11-27 15:01:03,043 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.01 vs. limit=15.0 2023-11-27 15:01:32,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3111286.6666666665, ans=0.125 2023-11-27 15:01:35,381 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.839e+01 8.575e+01 9.224e+01 9.953e+01 1.254e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-27 15:01:36,585 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 466700 2023-11-27 15:01:41,863 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 9800, loss[loss=0.07315, simple_loss=0.1137, pruned_loss=0.00955, audio_tagging_loss=0.006768, over 15901.00 frames. ], tot_loss[loss=0.06672, simple_loss=0.09065, pruned_loss=0.01262, audio_tagging_loss=0.008775, over 3042582.68 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 15:01:47,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3111353.3333333335, ans=0.125 2023-11-27 15:01:55,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3111420.0, ans=0.0 2023-11-27 15:02:08,600 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3111486.6666666665, ans=0.125 2023-11-27 15:02:11,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3111486.6666666665, ans=0.125 2023-11-27 15:02:13,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3111486.6666666665, ans=0.2 2023-11-27 15:02:33,022 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 466750 2023-11-27 15:02:36,212 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 15:02:36,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3111620.0, ans=0.125 2023-11-27 15:02:38,366 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 9850, loss[loss=0.07674, simple_loss=0.1189, pruned_loss=0.01136, audio_tagging_loss=0.005932, over 14491.00 frames. ], tot_loss[loss=0.06739, simple_loss=0.09191, pruned_loss=0.01283, audio_tagging_loss=0.008603, over 3044518.46 frames. ], batch size: 52, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 15:02:55,992 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.86 vs. limit=6.0 2023-11-27 15:03:04,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3111820.0, ans=0.05 2023-11-27 15:03:17,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3111886.6666666665, ans=0.2 2023-11-27 15:03:20,203 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.40 vs. limit=12.0 2023-11-27 15:03:29,428 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.309e+01 8.745e+01 9.208e+01 1.002e+02 1.325e+02, threshold=1.842e+02, percent-clipped=0.0 2023-11-27 15:03:30,660 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 466800 2023-11-27 15:03:35,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3111953.3333333335, ans=0.1 2023-11-27 15:03:36,887 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 9900, loss[loss=0.04909, simple_loss=0.06032, pruned_loss=0.008284, audio_tagging_loss=0.01064, over 14657.00 frames. ], tot_loss[loss=0.06691, simple_loss=0.09119, pruned_loss=0.01267, audio_tagging_loss=0.008649, over 3047138.34 frames. ], batch size: 58, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 15:03:40,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3112020.0, ans=0.2 2023-11-27 15:03:45,602 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3112020.0, ans=0.125 2023-11-27 15:03:56,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3112086.6666666665, ans=0.0 2023-11-27 15:04:06,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3112153.3333333335, ans=0.125 2023-11-27 15:04:07,290 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3112153.3333333335, ans=0.1 2023-11-27 15:04:26,038 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 15:04:28,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3112286.6666666665, ans=0.125 2023-11-27 15:04:29,182 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 466850 2023-11-27 15:04:31,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3112286.6666666665, ans=0.125 2023-11-27 15:04:34,672 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 9950, loss[loss=0.06726, simple_loss=0.08971, pruned_loss=0.01158, audio_tagging_loss=0.01082, over 15186.00 frames. ], tot_loss[loss=0.06687, simple_loss=0.09121, pruned_loss=0.01266, audio_tagging_loss=0.008614, over 3047271.38 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 15:04:50,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3112420.0, ans=0.125 2023-11-27 15:04:52,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3112420.0, ans=0.125 2023-11-27 15:04:53,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3112420.0, ans=0.0 2023-11-27 15:05:02,045 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3112486.6666666665, ans=10.0 2023-11-27 15:05:18,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3112553.3333333335, ans=0.125 2023-11-27 15:05:23,247 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.56 vs. limit=10.0 2023-11-27 15:05:24,777 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.611e+01 8.583e+01 9.133e+01 9.835e+01 1.182e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-27 15:05:25,947 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 466900 2023-11-27 15:05:31,328 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 10000, loss[loss=0.06466, simple_loss=0.09308, pruned_loss=0.01023, audio_tagging_loss=0.007895, over 14294.00 frames. ], tot_loss[loss=0.06674, simple_loss=0.09105, pruned_loss=0.01261, audio_tagging_loss=0.008602, over 3036393.65 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 15:05:31,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3112686.6666666665, ans=0.05 2023-11-27 15:05:34,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3112686.6666666665, ans=0.09899494936611666 2023-11-27 15:05:55,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3112820.0, ans=0.0 2023-11-27 15:06:23,656 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 466950 2023-11-27 15:06:29,082 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 10050, loss[loss=0.06113, simple_loss=0.08618, pruned_loss=0.01075, audio_tagging_loss=0.007283, over 15023.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.09019, pruned_loss=0.01244, audio_tagging_loss=0.008666, over 3038680.87 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 15:06:34,363 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.25 vs. limit=15.0 2023-11-27 15:06:40,803 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.73 vs. limit=22.5 2023-11-27 15:06:40,962 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.17 vs. limit=10.0 2023-11-27 15:06:43,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3113086.6666666665, ans=0.125 2023-11-27 15:06:47,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3113086.6666666665, ans=0.125 2023-11-27 15:07:10,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3113220.0, ans=0.125 2023-11-27 15:07:20,300 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.270e+01 8.460e+01 9.017e+01 9.705e+01 1.338e+02, threshold=1.803e+02, percent-clipped=0.0 2023-11-27 15:07:21,483 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 467000 2023-11-27 15:07:27,159 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 10100, loss[loss=0.06335, simple_loss=0.08559, pruned_loss=0.01079, audio_tagging_loss=0.009764, over 15218.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.09076, pruned_loss=0.01244, audio_tagging_loss=0.008699, over 3047489.63 frames. ], batch size: 58, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 15:07:33,544 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.55 vs. limit=15.0 2023-11-27 15:08:08,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3113553.3333333335, ans=0.125 2023-11-27 15:08:16,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3113620.0, ans=0.035 2023-11-27 15:08:17,390 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 15:08:18,576 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 467050 2023-11-27 15:08:23,952 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 10150, loss[loss=0.07282, simple_loss=0.1024, pruned_loss=0.01438, audio_tagging_loss=0.007265, over 15508.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.09068, pruned_loss=0.01236, audio_tagging_loss=0.008806, over 3048601.39 frames. ], batch size: 58, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 15:08:41,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3113753.3333333335, ans=0.125 2023-11-27 15:08:55,799 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 15:09:00,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3113886.6666666665, ans=0.04949747468305833 2023-11-27 15:09:14,320 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.276e+01 8.492e+01 9.148e+01 9.869e+01 1.257e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-27 15:09:15,493 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 467100 2023-11-27 15:09:21,476 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 10200, loss[loss=0.09096, simple_loss=0.1285, pruned_loss=0.02212, audio_tagging_loss=0.004568, over 15451.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.08975, pruned_loss=0.01228, audio_tagging_loss=0.008924, over 3043022.75 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 15:09:21,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3114020.0, ans=0.125 2023-11-27 15:09:25,398 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3114020.0, ans=0.125 2023-11-27 15:09:37,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3114086.6666666665, ans=0.125 2023-11-27 15:09:40,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3114086.6666666665, ans=0.1 2023-11-27 15:09:40,993 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3114086.6666666665, ans=0.2 2023-11-27 15:09:43,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3114086.6666666665, ans=0.1 2023-11-27 15:09:45,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3114153.3333333335, ans=0.125 2023-11-27 15:09:48,641 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 15:09:50,415 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.09 vs. limit=15.0 2023-11-27 15:09:52,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3114153.3333333335, ans=0.125 2023-11-27 15:09:58,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3114220.0, ans=0.125 2023-11-27 15:10:02,822 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.53 vs. limit=15.0 2023-11-27 15:10:05,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3114220.0, ans=0.04949747468305833 2023-11-27 15:10:14,284 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 467150 2023-11-27 15:10:15,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3114286.6666666665, ans=0.0 2023-11-27 15:10:20,424 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 10250, loss[loss=0.08393, simple_loss=0.109, pruned_loss=0.02111, audio_tagging_loss=0.00833, over 16054.00 frames. ], tot_loss[loss=0.06687, simple_loss=0.0907, pruned_loss=0.01259, audio_tagging_loss=0.008937, over 3050233.60 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 15:10:34,806 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=3114420.0, ans=0.05 2023-11-27 15:10:36,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3114420.0, ans=0.0 2023-11-27 15:10:46,259 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3114486.6666666665, ans=0.0 2023-11-27 15:10:56,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3114553.3333333335, ans=0.0 2023-11-27 15:11:08,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3114620.0, ans=0.2 2023-11-27 15:11:11,575 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.304e+01 8.883e+01 9.540e+01 1.004e+02 1.419e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-27 15:11:11,678 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 467200 2023-11-27 15:11:17,257 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 10300, loss[loss=0.06731, simple_loss=0.0903, pruned_loss=0.009387, audio_tagging_loss=0.01278, over 14385.00 frames. ], tot_loss[loss=0.06691, simple_loss=0.09081, pruned_loss=0.01244, audio_tagging_loss=0.009062, over 3048744.60 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 15:11:38,905 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3114753.3333333335, ans=0.1 2023-11-27 15:11:47,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3114820.0, ans=0.125 2023-11-27 15:11:53,828 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3114886.6666666665, ans=0.2 2023-11-27 15:12:01,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3114886.6666666665, ans=0.125 2023-11-27 15:12:04,732 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3114953.3333333335, ans=0.125 2023-11-27 15:12:09,022 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 467250 2023-11-27 15:12:14,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3115020.0, ans=0.0 2023-11-27 15:12:14,908 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 10350, loss[loss=0.0659, simple_loss=0.08328, pruned_loss=0.01158, audio_tagging_loss=0.01268, over 16684.00 frames. ], tot_loss[loss=0.06748, simple_loss=0.09138, pruned_loss=0.01271, audio_tagging_loss=0.009077, over 3057763.52 frames. ], batch size: 64, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 15:12:16,319 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3115020.0, ans=0.125 2023-11-27 15:12:26,018 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 15:12:33,206 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 15:12:37,988 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.54 vs. limit=15.0 2023-11-27 15:12:38,787 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3115153.3333333335, ans=0.125 2023-11-27 15:12:50,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3115220.0, ans=0.125 2023-11-27 15:13:07,649 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.393e+01 8.698e+01 9.408e+01 1.024e+02 1.336e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-27 15:13:07,751 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 467300 2023-11-27 15:13:10,412 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.06 vs. limit=15.0 2023-11-27 15:13:11,169 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3115286.6666666665, ans=0.125 2023-11-27 15:13:13,065 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 10400, loss[loss=0.06671, simple_loss=0.0889, pruned_loss=0.01154, audio_tagging_loss=0.01072, over 15057.00 frames. ], tot_loss[loss=0.06741, simple_loss=0.091, pruned_loss=0.01268, audio_tagging_loss=0.009227, over 3059497.58 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 15:13:27,599 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.50 vs. limit=12.0 2023-11-27 15:13:46,510 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.79 vs. limit=6.0 2023-11-27 15:13:55,760 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3115553.3333333335, ans=0.1 2023-11-27 15:13:56,885 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3115553.3333333335, ans=0.1 2023-11-27 15:14:02,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3115620.0, ans=0.95 2023-11-27 15:14:05,108 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 467350 2023-11-27 15:14:10,445 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 10450, loss[loss=0.07798, simple_loss=0.1143, pruned_loss=0.01547, audio_tagging_loss=0.005367, over 16372.00 frames. ], tot_loss[loss=0.06752, simple_loss=0.09128, pruned_loss=0.0128, audio_tagging_loss=0.009087, over 3058932.36 frames. ], batch size: 61, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 15:14:11,864 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3115686.6666666665, ans=0.125 2023-11-27 15:14:41,416 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.80 vs. limit=15.0 2023-11-27 15:15:02,071 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.347e+01 8.776e+01 9.506e+01 1.066e+02 3.679e+02, threshold=1.901e+02, percent-clipped=1.0 2023-11-27 15:15:02,178 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 467400 2023-11-27 15:15:08,015 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 10500, loss[loss=0.0706, simple_loss=0.08881, pruned_loss=0.01627, audio_tagging_loss=0.009922, over 14900.00 frames. ], tot_loss[loss=0.06712, simple_loss=0.09071, pruned_loss=0.01274, audio_tagging_loss=0.009029, over 3052515.55 frames. ], batch size: 54, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 15:15:21,893 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3116086.6666666665, ans=0.1 2023-11-27 15:15:31,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3116153.3333333335, ans=0.0 2023-11-27 15:15:34,399 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3116153.3333333335, ans=0.0 2023-11-27 15:15:43,263 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3116220.0, ans=0.0 2023-11-27 15:15:54,783 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.22 vs. limit=15.0 2023-11-27 15:16:00,292 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 467450 2023-11-27 15:16:06,290 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 10550, loss[loss=0.05274, simple_loss=0.0656, pruned_loss=0.01209, audio_tagging_loss=0.007851, over 14104.00 frames. ], tot_loss[loss=0.06711, simple_loss=0.09078, pruned_loss=0.01274, audio_tagging_loss=0.008979, over 3045620.71 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 15:16:16,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=3116420.0, ans=0.5 2023-11-27 15:16:16,734 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.79 vs. limit=15.0 2023-11-27 15:16:25,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3116420.0, ans=0.0 2023-11-27 15:16:51,118 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 15:16:57,565 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 467500 2023-11-27 15:16:58,577 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.477e+01 8.619e+01 9.191e+01 9.903e+01 2.574e+02, threshold=1.838e+02, percent-clipped=2.0 2023-11-27 15:17:02,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3116686.6666666665, ans=0.125 2023-11-27 15:17:03,607 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 10600, loss[loss=0.06571, simple_loss=0.09464, pruned_loss=0.0107, audio_tagging_loss=0.007694, over 15089.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.09016, pruned_loss=0.01267, audio_tagging_loss=0.008832, over 3043418.27 frames. ], batch size: 58, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 15:17:04,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3116686.6666666665, ans=0.2 2023-11-27 15:17:07,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3116686.6666666665, ans=0.0 2023-11-27 15:17:18,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3116753.3333333335, ans=0.1 2023-11-27 15:17:33,234 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.28 vs. limit=12.0 2023-11-27 15:17:51,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3116953.3333333335, ans=0.125 2023-11-27 15:17:53,725 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.09 vs. limit=10.0 2023-11-27 15:17:55,366 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 467550 2023-11-27 15:18:00,707 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 10650, loss[loss=0.0547, simple_loss=0.07504, pruned_loss=0.00842, audio_tagging_loss=0.008761, over 14905.00 frames. ], tot_loss[loss=0.06687, simple_loss=0.09049, pruned_loss=0.01285, audio_tagging_loss=0.008775, over 3037962.79 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 8.0 2023-11-27 15:18:04,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3117020.0, ans=0.125 2023-11-27 15:18:04,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3117020.0, ans=0.0 2023-11-27 15:18:11,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3117086.6666666665, ans=0.0 2023-11-27 15:18:12,094 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.13 vs. limit=15.0 2023-11-27 15:18:14,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3117086.6666666665, ans=0.125 2023-11-27 15:18:18,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3117086.6666666665, ans=0.125 2023-11-27 15:18:18,969 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3117086.6666666665, ans=0.0 2023-11-27 15:18:21,185 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3117086.6666666665, ans=0.0 2023-11-27 15:18:36,342 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.44 vs. limit=10.0 2023-11-27 15:18:41,548 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3117220.0, ans=0.125 2023-11-27 15:18:41,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3117220.0, ans=0.125 2023-11-27 15:18:46,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3117286.6666666665, ans=0.125 2023-11-27 15:18:52,877 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 467600 2023-11-27 15:18:55,394 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.532e+01 8.520e+01 9.175e+01 9.888e+01 1.340e+02, threshold=1.835e+02, percent-clipped=0.0 2023-11-27 15:18:58,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3117353.3333333335, ans=0.125 2023-11-27 15:18:59,328 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 10700, loss[loss=0.05302, simple_loss=0.07272, pruned_loss=0.008121, audio_tagging_loss=0.008535, over 15437.00 frames. ], tot_loss[loss=0.06683, simple_loss=0.09072, pruned_loss=0.01273, audio_tagging_loss=0.008738, over 3044433.13 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 8.0 2023-11-27 15:19:02,905 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3117353.3333333335, ans=0.035 2023-11-27 15:19:27,581 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3117486.6666666665, ans=0.125 2023-11-27 15:19:49,018 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3117620.0, ans=0.125 2023-11-27 15:19:50,917 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 467650 2023-11-27 15:19:56,286 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 10750, loss[loss=0.06931, simple_loss=0.0885, pruned_loss=0.01632, audio_tagging_loss=0.008737, over 14714.00 frames. ], tot_loss[loss=0.06695, simple_loss=0.0908, pruned_loss=0.01284, audio_tagging_loss=0.008708, over 3043643.02 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 8.0 2023-11-27 15:20:16,187 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 15:20:35,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3117886.6666666665, ans=0.1 2023-11-27 15:20:40,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3117953.3333333335, ans=0.0 2023-11-27 15:20:45,290 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3117953.3333333335, ans=0.0 2023-11-27 15:20:47,239 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 467700 2023-11-27 15:20:49,904 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.438e+01 8.439e+01 9.244e+01 9.878e+01 1.512e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-27 15:20:53,263 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 10800, loss[loss=0.06149, simple_loss=0.07625, pruned_loss=0.009996, audio_tagging_loss=0.01337, over 15054.00 frames. ], tot_loss[loss=0.0667, simple_loss=0.09065, pruned_loss=0.01275, audio_tagging_loss=0.00862, over 3045916.46 frames. ], batch size: 58, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 15:20:53,521 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3118020.0, ans=0.125 2023-11-27 15:21:01,923 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.07 vs. limit=15.0 2023-11-27 15:21:11,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3118086.6666666665, ans=0.125 2023-11-27 15:21:37,012 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3118220.0, ans=0.125 2023-11-27 15:21:44,964 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 467750 2023-11-27 15:21:45,426 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.70 vs. limit=15.0 2023-11-27 15:21:50,811 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 10850, loss[loss=0.06346, simple_loss=0.07858, pruned_loss=0.0114, audio_tagging_loss=0.01276, over 14889.00 frames. ], tot_loss[loss=0.06672, simple_loss=0.0906, pruned_loss=0.0128, audio_tagging_loss=0.008612, over 3040914.03 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 8.0 2023-11-27 15:21:57,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3118353.3333333335, ans=0.1 2023-11-27 15:22:00,781 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.62 vs. limit=22.5 2023-11-27 15:22:25,499 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3118553.3333333335, ans=0.125 2023-11-27 15:22:27,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3118553.3333333335, ans=0.1 2023-11-27 15:22:30,194 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3118553.3333333335, ans=0.0 2023-11-27 15:22:36,927 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3118620.0, ans=0.1 2023-11-27 15:22:43,101 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 467800 2023-11-27 15:22:46,567 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.389e+01 8.986e+01 9.693e+01 1.013e+02 1.433e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-27 15:22:48,722 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 10900, loss[loss=0.06138, simple_loss=0.08627, pruned_loss=0.01129, audio_tagging_loss=0.00695, over 15360.00 frames. ], tot_loss[loss=0.06706, simple_loss=0.09116, pruned_loss=0.01287, audio_tagging_loss=0.008619, over 3049535.44 frames. ], batch size: 58, lr: 1.73e-03, grad_scale: 8.0 2023-11-27 15:22:49,822 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 15:23:13,300 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.42 vs. limit=15.0 2023-11-27 15:23:18,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3118820.0, ans=0.1 2023-11-27 15:23:30,900 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.18 vs. limit=15.0 2023-11-27 15:23:35,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3118953.3333333335, ans=0.2 2023-11-27 15:23:40,194 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 467850 2023-11-27 15:23:43,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3118953.3333333335, ans=0.125 2023-11-27 15:23:45,538 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 10950, loss[loss=0.07032, simple_loss=0.0966, pruned_loss=0.01339, audio_tagging_loss=0.008629, over 15506.00 frames. ], tot_loss[loss=0.0673, simple_loss=0.09159, pruned_loss=0.01287, audio_tagging_loss=0.008639, over 3050499.70 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 8.0 2023-11-27 15:24:12,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3119153.3333333335, ans=0.125 2023-11-27 15:24:12,969 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3119153.3333333335, ans=0.0 2023-11-27 15:24:23,151 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.29 vs. limit=6.0 2023-11-27 15:24:23,731 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3119220.0, ans=0.0 2023-11-27 15:24:28,169 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.34 vs. limit=15.0 2023-11-27 15:24:28,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3119220.0, ans=0.04949747468305833 2023-11-27 15:24:37,586 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 467900 2023-11-27 15:24:40,678 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.413e+01 8.952e+01 9.286e+01 1.000e+02 1.370e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-27 15:24:42,831 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 11000, loss[loss=0.05514, simple_loss=0.0749, pruned_loss=0.008987, audio_tagging_loss=0.008698, over 14747.00 frames. ], tot_loss[loss=0.06718, simple_loss=0.09133, pruned_loss=0.01284, audio_tagging_loss=0.008667, over 3045251.93 frames. ], batch size: 58, lr: 1.73e-03, grad_scale: 8.0 2023-11-27 15:24:44,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3119353.3333333335, ans=0.0 2023-11-27 15:24:57,064 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 15:25:01,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3119420.0, ans=0.07 2023-11-27 15:25:19,381 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3119553.3333333335, ans=0.125 2023-11-27 15:25:35,475 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 467950 2023-11-27 15:25:40,968 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 11050, loss[loss=0.08686, simple_loss=0.1131, pruned_loss=0.02195, audio_tagging_loss=0.008362, over 15339.00 frames. ], tot_loss[loss=0.06724, simple_loss=0.09118, pruned_loss=0.01291, audio_tagging_loss=0.008744, over 3045321.84 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 8.0 2023-11-27 15:25:41,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3119686.6666666665, ans=0.04949747468305833 2023-11-27 15:25:42,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3119686.6666666665, ans=0.0 2023-11-27 15:25:58,197 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.77 vs. limit=15.0 2023-11-27 15:25:59,905 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 15:26:04,186 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3119820.0, ans=0.125 2023-11-27 15:26:11,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3119820.0, ans=0.1 2023-11-27 15:26:31,575 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 468000 2023-11-27 15:26:37,138 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.310e+01 8.786e+01 9.414e+01 9.890e+01 1.526e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-27 15:26:39,335 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 11100, loss[loss=0.06786, simple_loss=0.09733, pruned_loss=0.01024, audio_tagging_loss=0.008947, over 15719.00 frames. ], tot_loss[loss=0.06744, simple_loss=0.09131, pruned_loss=0.01289, audio_tagging_loss=0.008892, over 3046032.22 frames. ], batch size: 58, lr: 1.73e-03, grad_scale: 8.0 2023-11-27 15:27:05,265 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.24 vs. limit=12.0 2023-11-27 15:27:06,887 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3120153.3333333335, ans=0.125 2023-11-27 15:27:30,478 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 468050 2023-11-27 15:27:36,997 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 11150, loss[loss=0.07737, simple_loss=0.1079, pruned_loss=0.01478, audio_tagging_loss=0.00864, over 14828.00 frames. ], tot_loss[loss=0.06683, simple_loss=0.09049, pruned_loss=0.01265, audio_tagging_loss=0.008934, over 3042629.84 frames. ], batch size: 54, lr: 1.73e-03, grad_scale: 8.0 2023-11-27 15:27:54,655 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.36 vs. limit=15.0 2023-11-27 15:28:20,562 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3120553.3333333335, ans=0.125 2023-11-27 15:28:29,084 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 468100 2023-11-27 15:28:32,273 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.636e+01 8.619e+01 9.079e+01 9.995e+01 1.250e+02, threshold=1.816e+02, percent-clipped=0.0 2023-11-27 15:28:34,471 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 11200, loss[loss=0.05931, simple_loss=0.08444, pruned_loss=0.008684, audio_tagging_loss=0.008412, over 15470.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.08946, pruned_loss=0.01246, audio_tagging_loss=0.009041, over 3041872.31 frames. ], batch size: 57, lr: 1.72e-03, grad_scale: 16.0 2023-11-27 15:28:38,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3120686.6666666665, ans=0.125 2023-11-27 15:28:51,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3120753.3333333335, ans=0.2 2023-11-27 15:28:53,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3120753.3333333335, ans=0.125 2023-11-27 15:29:06,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3120820.0, ans=0.09899494936611666 2023-11-27 15:29:06,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3120820.0, ans=0.0 2023-11-27 15:29:25,885 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 468150 2023-11-27 15:29:27,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3120953.3333333335, ans=0.125 2023-11-27 15:29:28,752 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.95 vs. limit=12.0 2023-11-27 15:29:31,261 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 11250, loss[loss=0.06899, simple_loss=0.09155, pruned_loss=0.01242, audio_tagging_loss=0.0108, over 15292.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.08859, pruned_loss=0.01242, audio_tagging_loss=0.009003, over 3052706.93 frames. ], batch size: 58, lr: 1.72e-03, grad_scale: 8.0 2023-11-27 15:29:47,437 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3121086.6666666665, ans=0.125 2023-11-27 15:30:04,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3121153.3333333335, ans=0.0 2023-11-27 15:30:07,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3121220.0, ans=0.0 2023-11-27 15:30:22,710 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 468200 2023-11-27 15:30:27,259 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.434e+01 8.697e+01 9.472e+01 1.045e+02 1.319e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-27 15:30:28,852 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 11300, loss[loss=0.06564, simple_loss=0.08502, pruned_loss=0.01567, audio_tagging_loss=0.007461, over 14365.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.08952, pruned_loss=0.01247, audio_tagging_loss=0.008783, over 3061430.85 frames. ], batch size: 54, lr: 1.72e-03, grad_scale: 8.0 2023-11-27 15:30:47,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3121420.0, ans=0.0 2023-11-27 15:30:57,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3121486.6666666665, ans=0.1 2023-11-27 15:31:06,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3121553.3333333335, ans=0.2 2023-11-27 15:31:09,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=3121553.3333333335, ans=0.05 2023-11-27 15:31:09,747 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.36 vs. limit=12.0 2023-11-27 15:31:20,438 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 468250 2023-11-27 15:31:20,961 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.56 vs. limit=12.0 2023-11-27 15:31:26,445 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 11350, loss[loss=0.05912, simple_loss=0.07972, pruned_loss=0.01173, audio_tagging_loss=0.007529, over 15051.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.09028, pruned_loss=0.01262, audio_tagging_loss=0.008655, over 3057038.93 frames. ], batch size: 57, lr: 1.72e-03, grad_scale: 8.0 2023-11-27 15:32:17,345 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 468300 2023-11-27 15:32:21,510 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.565e+01 8.462e+01 9.162e+01 9.878e+01 1.221e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-27 15:32:22,616 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 11400, loss[loss=0.07179, simple_loss=0.09816, pruned_loss=0.01416, audio_tagging_loss=0.00855, over 15022.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.09012, pruned_loss=0.0127, audio_tagging_loss=0.008584, over 3051535.96 frames. ], batch size: 54, lr: 1.72e-03, grad_scale: 8.0 2023-11-27 15:33:01,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3122220.0, ans=0.2 2023-11-27 15:33:11,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3122286.6666666665, ans=0.125 2023-11-27 15:33:12,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3122286.6666666665, ans=0.0 2023-11-27 15:33:13,423 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 468350 2023-11-27 15:33:14,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3122286.6666666665, ans=0.1 2023-11-27 15:33:18,864 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 11450, loss[loss=0.07204, simple_loss=0.1067, pruned_loss=0.01224, audio_tagging_loss=0.006433, over 15198.00 frames. ], tot_loss[loss=0.06693, simple_loss=0.09092, pruned_loss=0.01288, audio_tagging_loss=0.008593, over 3046625.82 frames. ], batch size: 59, lr: 1.72e-03, grad_scale: 8.0 2023-11-27 15:33:26,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3122353.3333333335, ans=0.1 2023-11-27 15:33:54,297 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3122553.3333333335, ans=0.125 2023-11-27 15:34:06,601 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3122620.0, ans=0.0 2023-11-27 15:34:11,208 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 468400 2023-11-27 15:34:16,382 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.556e+01 8.713e+01 9.522e+01 1.023e+02 1.434e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-27 15:34:17,528 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 11500, loss[loss=0.06229, simple_loss=0.07957, pruned_loss=0.01259, audio_tagging_loss=0.009913, over 15117.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.08962, pruned_loss=0.0127, audio_tagging_loss=0.008696, over 3042881.38 frames. ], batch size: 58, lr: 1.72e-03, grad_scale: 8.0 2023-11-27 15:34:20,169 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.41 vs. limit=6.0 2023-11-27 15:34:40,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3122820.0, ans=0.125 2023-11-27 15:34:49,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3122820.0, ans=10.0 2023-11-27 15:35:07,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3122953.3333333335, ans=0.125 2023-11-27 15:35:09,564 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 468450 2023-11-27 15:35:15,057 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 11550, loss[loss=0.0753, simple_loss=0.1042, pruned_loss=0.01663, audio_tagging_loss=0.006542, over 15578.00 frames. ], tot_loss[loss=0.06677, simple_loss=0.09054, pruned_loss=0.01282, audio_tagging_loss=0.008678, over 3049994.39 frames. ], batch size: 58, lr: 1.72e-03, grad_scale: 8.0 2023-11-27 15:35:22,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3123020.0, ans=0.0 2023-11-27 15:35:45,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3123153.3333333335, ans=0.0 2023-11-27 15:35:52,381 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3123220.0, ans=0.2 2023-11-27 15:35:54,365 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 15:35:56,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3123220.0, ans=0.125 2023-11-27 15:36:06,445 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 468500 2023-11-27 15:36:09,034 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.76 vs. limit=15.0 2023-11-27 15:36:10,705 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.756e+01 8.754e+01 9.552e+01 1.002e+02 2.038e+02, threshold=1.910e+02, percent-clipped=1.0 2023-11-27 15:36:11,828 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 11600, loss[loss=0.06563, simple_loss=0.08864, pruned_loss=0.01232, audio_tagging_loss=0.008997, over 15200.00 frames. ], tot_loss[loss=0.06732, simple_loss=0.09132, pruned_loss=0.01301, audio_tagging_loss=0.008652, over 3053834.25 frames. ], batch size: 55, lr: 1.72e-03, grad_scale: 16.0 2023-11-27 15:36:12,090 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3123353.3333333335, ans=0.125 2023-11-27 15:36:16,863 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3123353.3333333335, ans=0.025 2023-11-27 15:36:25,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3123420.0, ans=0.0 2023-11-27 15:36:34,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3123486.6666666665, ans=0.0 2023-11-27 15:36:37,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3123486.6666666665, ans=0.5 2023-11-27 15:36:38,981 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3123486.6666666665, ans=0.125 2023-11-27 15:36:43,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3123486.6666666665, ans=0.125 2023-11-27 15:36:46,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3123553.3333333335, ans=0.0 2023-11-27 15:36:47,157 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.07 vs. limit=15.0 2023-11-27 15:37:03,572 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 468550 2023-11-27 15:37:09,515 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 11650, loss[loss=0.06286, simple_loss=0.08036, pruned_loss=0.01308, audio_tagging_loss=0.009606, over 14107.00 frames. ], tot_loss[loss=0.06729, simple_loss=0.09126, pruned_loss=0.01297, audio_tagging_loss=0.00869, over 3051921.12 frames. ], batch size: 53, lr: 1.72e-03, grad_scale: 16.0 2023-11-27 15:37:29,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3123753.3333333335, ans=0.1 2023-11-27 15:37:30,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3123753.3333333335, ans=0.1 2023-11-27 15:37:58,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=3123953.3333333335, ans=0.1 2023-11-27 15:38:01,355 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 468600 2023-11-27 15:38:06,628 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.108e+01 8.764e+01 9.242e+01 1.012e+02 1.452e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-27 15:38:07,819 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 11700, loss[loss=0.07644, simple_loss=0.1109, pruned_loss=0.01447, audio_tagging_loss=0.006504, over 15164.00 frames. ], tot_loss[loss=0.06718, simple_loss=0.0912, pruned_loss=0.01288, audio_tagging_loss=0.008695, over 3054818.32 frames. ], batch size: 55, lr: 1.72e-03, grad_scale: 16.0 2023-11-27 15:38:18,932 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 15:38:28,419 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.71 vs. limit=12.0 2023-11-27 15:38:32,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3124153.3333333335, ans=0.125 2023-11-27 15:38:59,047 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 468650 2023-11-27 15:39:04,393 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 11750, loss[loss=0.07039, simple_loss=0.09841, pruned_loss=0.01295, audio_tagging_loss=0.00824, over 15189.00 frames. ], tot_loss[loss=0.06738, simple_loss=0.09149, pruned_loss=0.01293, audio_tagging_loss=0.008707, over 3056152.97 frames. ], batch size: 56, lr: 1.72e-03, grad_scale: 16.0 2023-11-27 15:39:04,705 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3124353.3333333335, ans=0.125 2023-11-27 15:39:10,500 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.98 vs. limit=15.0 2023-11-27 15:39:22,522 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3124420.0, ans=0.125 2023-11-27 15:39:31,219 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 15:39:51,348 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3124620.0, ans=0.1 2023-11-27 15:39:56,142 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 468700 2023-11-27 15:40:00,302 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.652e+01 8.569e+01 9.104e+01 9.733e+01 1.192e+02, threshold=1.821e+02, percent-clipped=0.0 2023-11-27 15:40:01,919 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 11800, loss[loss=0.07571, simple_loss=0.09947, pruned_loss=0.01838, audio_tagging_loss=0.0076, over 14830.00 frames. ], tot_loss[loss=0.06702, simple_loss=0.09083, pruned_loss=0.01287, audio_tagging_loss=0.00874, over 3052676.09 frames. ], batch size: 57, lr: 1.72e-03, grad_scale: 16.0 2023-11-27 15:40:13,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3124753.3333333335, ans=0.1 2023-11-27 15:40:32,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3124820.0, ans=0.0 2023-11-27 15:40:32,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff3.min_abs, batch_count=3124820.0, ans=0.2 2023-11-27 15:40:53,876 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 468750 2023-11-27 15:40:56,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3124953.3333333335, ans=0.1 2023-11-27 15:40:59,250 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 11850, loss[loss=0.06455, simple_loss=0.08672, pruned_loss=0.0119, audio_tagging_loss=0.009293, over 13558.00 frames. ], tot_loss[loss=0.06716, simple_loss=0.09097, pruned_loss=0.01287, audio_tagging_loss=0.008798, over 3046645.79 frames. ], batch size: 52, lr: 1.72e-03, grad_scale: 16.0 2023-11-27 15:41:03,352 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.35 vs. limit=22.5 2023-11-27 15:41:42,927 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3125220.0, ans=0.2 2023-11-27 15:41:43,155 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.64 vs. limit=15.0 2023-11-27 15:41:50,329 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 468800 2023-11-27 15:41:55,555 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.796e+01 8.519e+01 9.146e+01 9.837e+01 1.247e+02, threshold=1.829e+02, percent-clipped=0.0 2023-11-27 15:41:56,671 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 11900, loss[loss=0.05211, simple_loss=0.07077, pruned_loss=0.009342, audio_tagging_loss=0.007386, over 15432.00 frames. ], tot_loss[loss=0.0672, simple_loss=0.09094, pruned_loss=0.0128, audio_tagging_loss=0.008928, over 3048107.80 frames. ], batch size: 59, lr: 1.72e-03, grad_scale: 16.0 2023-11-27 15:42:12,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3125420.0, ans=0.035 2023-11-27 15:42:47,629 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 468850 2023-11-27 15:42:47,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3125620.0, ans=0.2 2023-11-27 15:42:53,439 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 11950, loss[loss=0.06352, simple_loss=0.08775, pruned_loss=0.009977, audio_tagging_loss=0.00967, over 14879.00 frames. ], tot_loss[loss=0.06737, simple_loss=0.09131, pruned_loss=0.01279, audio_tagging_loss=0.008922, over 3054921.94 frames. ], batch size: 55, lr: 1.72e-03, grad_scale: 16.0 2023-11-27 15:43:01,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3125686.6666666665, ans=0.125 2023-11-27 15:43:15,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3125820.0, ans=0.125 2023-11-27 15:43:19,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3125820.0, ans=0.0 2023-11-27 15:43:20,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3125820.0, ans=0.125 2023-11-27 15:43:25,277 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.39 vs. limit=15.0 2023-11-27 15:43:43,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3125953.3333333335, ans=0.125 2023-11-27 15:43:44,372 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 468900 2023-11-27 15:43:48,495 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.451e+01 8.695e+01 9.240e+01 9.952e+01 1.274e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-27 15:43:49,574 INFO [train_asr.py:1235] (1/4) Epoch 39, batch 12000, loss[loss=0.0793, simple_loss=0.1117, pruned_loss=0.0168, audio_tagging_loss=0.00663, over 15546.00 frames. ], tot_loss[loss=0.06734, simple_loss=0.09117, pruned_loss=0.01274, audio_tagging_loss=0.00902, over 3052929.49 frames. ], batch size: 54, lr: 1.72e-03, grad_scale: 32.0 2023-11-27 15:43:49,574 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-27 15:44:09,527 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.3325, 5.0432, 4.7152, 5.1709], device='cuda:1') 2023-11-27 15:44:24,029 INFO [train_asr.py:1267] (1/4) Epoch 39, validation: loss=0.05766, simple_loss=0.05064, pruned_loss=0.005162, audio_tagging_loss=0.02718, over 4681554.00 frames. 2023-11-27 15:44:24,030 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-27 15:44:35,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3126086.6666666665, ans=0.125 2023-11-27 15:44:40,931 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3126086.6666666665, ans=0.125 2023-11-27 15:44:44,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3126153.3333333335, ans=0.125 2023-11-27 15:45:06,135 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 0, loss[loss=0.07303, simple_loss=0.08986, pruned_loss=0.006395, audio_tagging_loss=0.02171, over 14604.00 frames. ], tot_loss[loss=0.07303, simple_loss=0.08986, pruned_loss=0.006395, audio_tagging_loss=0.02171, over 14604.00 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 15:45:06,135 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-27 15:45:41,195 INFO [train_asr.py:1267] (1/4) Epoch 40, validation: loss=0.05772, simple_loss=0.0507, pruned_loss=0.005215, audio_tagging_loss=0.02715, over 4681554.00 frames. 2023-11-27 15:45:41,195 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-27 15:45:45,603 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.92 vs. limit=22.5 2023-11-27 15:45:47,648 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.06 vs. limit=15.0 2023-11-27 15:45:53,437 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3126253.3333333335, ans=0.125 2023-11-27 15:45:57,303 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.35 vs. limit=22.5 2023-11-27 15:46:04,301 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 468950 2023-11-27 15:46:11,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3126320.0, ans=0.2 2023-11-27 15:46:39,233 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 50, loss[loss=0.07449, simple_loss=0.09418, pruned_loss=0.0117, audio_tagging_loss=0.0157, over 15463.00 frames. ], tot_loss[loss=0.07806, simple_loss=0.0945, pruned_loss=0.01389, audio_tagging_loss=0.01693, over 685141.79 frames. ], batch size: 61, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 15:46:55,226 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.54 vs. limit=15.0 2023-11-27 15:46:55,906 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3126586.6666666665, ans=0.125 2023-11-27 15:47:01,952 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 469000 2023-11-27 15:47:05,055 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.58 vs. limit=15.0 2023-11-27 15:47:06,530 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.784e+01 9.209e+01 9.877e+01 1.086e+02 2.497e+02, threshold=1.975e+02, percent-clipped=1.0 2023-11-27 15:47:30,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=3126786.6666666665, ans=0.05 2023-11-27 15:47:35,814 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3126853.3333333335, ans=0.5 2023-11-27 15:47:36,538 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 100, loss[loss=0.08464, simple_loss=0.1059, pruned_loss=0.01645, audio_tagging_loss=0.01523, over 15414.00 frames. ], tot_loss[loss=0.07572, simple_loss=0.09297, pruned_loss=0.01327, audio_tagging_loss=0.01596, over 1212810.28 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 15:47:37,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3126853.3333333335, ans=0.0 2023-11-27 15:47:45,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3126853.3333333335, ans=0.1 2023-11-27 15:47:49,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3126920.0, ans=0.125 2023-11-27 15:48:00,232 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 469050 2023-11-27 15:48:09,898 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.80 vs. limit=15.0 2023-11-27 15:48:10,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3127053.3333333335, ans=0.125 2023-11-27 15:48:15,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3127053.3333333335, ans=0.125 2023-11-27 15:48:16,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3127053.3333333335, ans=0.125 2023-11-27 15:48:17,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3127053.3333333335, ans=0.1 2023-11-27 15:48:23,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3127120.0, ans=0.125 2023-11-27 15:48:34,256 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 150, loss[loss=0.08183, simple_loss=0.1093, pruned_loss=0.01679, audio_tagging_loss=0.01038, over 15362.00 frames. ], tot_loss[loss=0.07376, simple_loss=0.09291, pruned_loss=0.01318, audio_tagging_loss=0.01413, over 1621443.54 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 15:48:40,182 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.50 vs. limit=22.5 2023-11-27 15:48:55,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3127253.3333333335, ans=0.0 2023-11-27 15:48:58,108 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 469100 2023-11-27 15:49:01,788 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3127320.0, ans=0.0 2023-11-27 15:49:03,615 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.882e+01 9.129e+01 9.870e+01 1.058e+02 1.571e+02, threshold=1.974e+02, percent-clipped=0.0 2023-11-27 15:49:14,763 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.05 vs. limit=12.0 2023-11-27 15:49:18,086 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.70 vs. limit=22.5 2023-11-27 15:49:24,018 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3127453.3333333335, ans=0.125 2023-11-27 15:49:32,953 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 200, loss[loss=0.06863, simple_loss=0.1044, pruned_loss=0.01028, audio_tagging_loss=0.006151, over 14974.00 frames. ], tot_loss[loss=0.07153, simple_loss=0.09186, pruned_loss=0.01292, audio_tagging_loss=0.01268, over 1934836.48 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 15:49:33,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3127520.0, ans=0.1 2023-11-27 15:49:36,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3127520.0, ans=0.0 2023-11-27 15:49:55,029 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 469150 2023-11-27 15:49:55,522 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.95 vs. limit=15.0 2023-11-27 15:50:06,674 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3127720.0, ans=0.0 2023-11-27 15:50:29,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3127853.3333333335, ans=0.09899494936611666 2023-11-27 15:50:29,994 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 250, loss[loss=0.07004, simple_loss=0.1014, pruned_loss=0.00994, audio_tagging_loss=0.009384, over 15043.00 frames. ], tot_loss[loss=0.07072, simple_loss=0.09216, pruned_loss=0.01305, audio_tagging_loss=0.01159, over 2181666.38 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 15:50:43,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3127920.0, ans=0.05 2023-11-27 15:50:45,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3127920.0, ans=0.0 2023-11-27 15:50:53,425 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 469200 2023-11-27 15:50:59,103 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.262e+01 8.939e+01 9.454e+01 1.026e+02 1.364e+02, threshold=1.891e+02, percent-clipped=0.0 2023-11-27 15:51:06,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3128053.3333333335, ans=0.1 2023-11-27 15:51:16,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3128120.0, ans=0.125 2023-11-27 15:51:17,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3128120.0, ans=0.125 2023-11-27 15:51:19,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3128120.0, ans=0.04949747468305833 2023-11-27 15:51:26,673 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 300, loss[loss=0.06916, simple_loss=0.09598, pruned_loss=0.01424, audio_tagging_loss=0.006933, over 14518.00 frames. ], tot_loss[loss=0.06992, simple_loss=0.09222, pruned_loss=0.01301, audio_tagging_loss=0.0108, over 2373349.66 frames. ], batch size: 54, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 15:51:32,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3128186.6666666665, ans=0.0 2023-11-27 15:51:37,373 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.12 vs. limit=15.0 2023-11-27 15:51:50,357 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 469250 2023-11-27 15:51:50,868 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.92 vs. limit=15.0 2023-11-27 15:52:01,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3128386.6666666665, ans=0.0 2023-11-27 15:52:18,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3128453.3333333335, ans=0.0 2023-11-27 15:52:24,504 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 350, loss[loss=0.0593, simple_loss=0.07804, pruned_loss=0.0131, audio_tagging_loss=0.007189, over 15900.00 frames. ], tot_loss[loss=0.06944, simple_loss=0.09252, pruned_loss=0.01293, audio_tagging_loss=0.01025, over 2525690.44 frames. ], batch size: 62, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 15:52:36,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3128586.6666666665, ans=0.125 2023-11-27 15:52:36,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=3128586.6666666665, ans=0.025 2023-11-27 15:52:37,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3128586.6666666665, ans=0.125 2023-11-27 15:52:45,912 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3128653.3333333335, ans=0.0 2023-11-27 15:52:46,883 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 469300 2023-11-27 15:52:47,650 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.42 vs. limit=15.0 2023-11-27 15:52:49,122 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3128653.3333333335, ans=0.125 2023-11-27 15:52:52,151 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.928e+01 8.667e+01 9.273e+01 1.018e+02 1.811e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-27 15:53:13,511 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.39 vs. limit=15.0 2023-11-27 15:53:15,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3128786.6666666665, ans=0.0 2023-11-27 15:53:21,708 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 400, loss[loss=0.07188, simple_loss=0.0987, pruned_loss=0.01534, audio_tagging_loss=0.007193, over 15481.00 frames. ], tot_loss[loss=0.06908, simple_loss=0.09291, pruned_loss=0.0129, audio_tagging_loss=0.009721, over 2639566.61 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 15:53:22,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3128853.3333333335, ans=0.2 2023-11-27 15:53:22,316 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.08 vs. limit=22.5 2023-11-27 15:53:23,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3128853.3333333335, ans=0.125 2023-11-27 15:53:25,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3128853.3333333335, ans=0.2 2023-11-27 15:53:41,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3128920.0, ans=0.0 2023-11-27 15:53:44,319 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 469350 2023-11-27 15:54:08,216 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.44 vs. limit=15.0 2023-11-27 15:54:13,710 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.00 vs. limit=22.5 2023-11-27 15:54:17,426 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 450, loss[loss=0.07, simple_loss=0.09035, pruned_loss=0.01318, audio_tagging_loss=0.01164, over 14461.00 frames. ], tot_loss[loss=0.06805, simple_loss=0.09155, pruned_loss=0.01274, audio_tagging_loss=0.00953, over 2728052.53 frames. ], batch size: 53, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 15:54:20,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3129186.6666666665, ans=0.025 2023-11-27 15:54:41,500 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 469400 2023-11-27 15:54:47,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3129320.0, ans=0.125 2023-11-27 15:54:48,257 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.496e+01 8.585e+01 9.092e+01 1.003e+02 1.210e+02, threshold=1.818e+02, percent-clipped=0.0 2023-11-27 15:54:48,561 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3129320.0, ans=0.2 2023-11-27 15:54:49,694 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3129320.0, ans=0.125 2023-11-27 15:54:57,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3129386.6666666665, ans=0.0 2023-11-27 15:55:00,720 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3129386.6666666665, ans=0.04949747468305833 2023-11-27 15:55:14,561 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3129453.3333333335, ans=0.0 2023-11-27 15:55:16,569 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 500, loss[loss=0.06067, simple_loss=0.08654, pruned_loss=0.008141, audio_tagging_loss=0.009257, over 16454.00 frames. ], tot_loss[loss=0.0682, simple_loss=0.09252, pruned_loss=0.01269, audio_tagging_loss=0.009246, over 2803719.99 frames. ], batch size: 60, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 15:55:17,061 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.60 vs. limit=15.0 2023-11-27 15:55:25,080 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3129520.0, ans=0.2 2023-11-27 15:55:31,049 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.80 vs. limit=15.0 2023-11-27 15:55:39,475 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 469450 2023-11-27 15:56:04,066 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3129786.6666666665, ans=0.0 2023-11-27 15:56:14,249 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 550, loss[loss=0.07331, simple_loss=0.08989, pruned_loss=0.01893, audio_tagging_loss=0.009434, over 13351.00 frames. ], tot_loss[loss=0.06822, simple_loss=0.09225, pruned_loss=0.01288, audio_tagging_loss=0.009211, over 2852454.44 frames. ], batch size: 53, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 15:56:16,709 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3129853.3333333335, ans=0.125 2023-11-27 15:56:24,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3129920.0, ans=0.125 2023-11-27 15:56:26,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3129920.0, ans=0.125 2023-11-27 15:56:29,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=3129920.0, ans=15.0 2023-11-27 15:56:31,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3129920.0, ans=0.2 2023-11-27 15:56:36,945 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 469500 2023-11-27 15:56:37,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3129986.6666666665, ans=0.2 2023-11-27 15:56:40,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3129986.6666666665, ans=0.125 2023-11-27 15:56:44,644 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.960e+01 8.483e+01 9.154e+01 9.792e+01 1.177e+02, threshold=1.831e+02, percent-clipped=0.0 2023-11-27 15:56:55,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3130053.3333333335, ans=0.125 2023-11-27 15:56:55,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3130053.3333333335, ans=0.125 2023-11-27 15:56:58,513 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3130053.3333333335, ans=0.125 2023-11-27 15:56:58,931 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.10 vs. limit=22.5 2023-11-27 15:57:11,342 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 600, loss[loss=0.06923, simple_loss=0.0934, pruned_loss=0.01333, audio_tagging_loss=0.009197, over 16768.00 frames. ], tot_loss[loss=0.06777, simple_loss=0.09164, pruned_loss=0.01277, audio_tagging_loss=0.009186, over 2895682.42 frames. ], batch size: 62, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 15:57:21,546 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3130186.6666666665, ans=0.125 2023-11-27 15:57:35,683 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 469550 2023-11-27 15:57:42,248 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3130320.0, ans=0.125 2023-11-27 15:57:43,708 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3130320.0, ans=0.125 2023-11-27 15:57:55,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3130386.6666666665, ans=0.0 2023-11-27 15:57:57,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3130453.3333333335, ans=0.125 2023-11-27 15:58:09,179 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 650, loss[loss=0.05935, simple_loss=0.0784, pruned_loss=0.009772, audio_tagging_loss=0.01038, over 15575.00 frames. ], tot_loss[loss=0.06762, simple_loss=0.09154, pruned_loss=0.01272, audio_tagging_loss=0.009132, over 2928567.70 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 15:58:11,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3130520.0, ans=0.2 2023-11-27 15:58:16,636 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3130520.0, ans=0.125 2023-11-27 15:58:32,572 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 469600 2023-11-27 15:58:40,343 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.596e+01 8.671e+01 9.149e+01 9.953e+01 1.299e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-27 15:59:01,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3130786.6666666665, ans=0.0 2023-11-27 15:59:07,568 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 700, loss[loss=0.06674, simple_loss=0.09525, pruned_loss=0.0129, audio_tagging_loss=0.006214, over 15096.00 frames. ], tot_loss[loss=0.06715, simple_loss=0.09051, pruned_loss=0.01275, audio_tagging_loss=0.009154, over 2950322.86 frames. ], batch size: 58, lr: 1.70e-03, grad_scale: 8.0 2023-11-27 15:59:13,845 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.57 vs. limit=15.0 2023-11-27 15:59:24,037 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.53 vs. limit=15.0 2023-11-27 15:59:30,010 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 469650 2023-11-27 15:59:47,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3131053.3333333335, ans=0.0 2023-11-27 15:59:48,268 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.52 vs. limit=10.0 2023-11-27 15:59:49,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3131053.3333333335, ans=0.125 2023-11-27 15:59:56,124 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.32 vs. limit=8.0 2023-11-27 15:59:57,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3131120.0, ans=0.125 2023-11-27 16:00:01,104 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3131120.0, ans=0.2 2023-11-27 16:00:05,239 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 750, loss[loss=0.06236, simple_loss=0.08853, pruned_loss=0.01087, audio_tagging_loss=0.007225, over 14873.00 frames. ], tot_loss[loss=0.06691, simple_loss=0.08992, pruned_loss=0.01285, audio_tagging_loss=0.009101, over 2966952.69 frames. ], batch size: 54, lr: 1.70e-03, grad_scale: 8.0 2023-11-27 16:00:09,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3131186.6666666665, ans=0.0 2023-11-27 16:00:09,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3131186.6666666665, ans=0.0 2023-11-27 16:00:21,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3131253.3333333335, ans=0.0 2023-11-27 16:00:28,340 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 469700 2023-11-27 16:00:28,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3131320.0, ans=0.0 2023-11-27 16:00:36,958 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.121e+01 8.681e+01 9.396e+01 9.945e+01 1.193e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-27 16:00:39,737 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.35 vs. limit=15.0 2023-11-27 16:00:44,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3131386.6666666665, ans=0.07 2023-11-27 16:00:56,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3131453.3333333335, ans=0.1 2023-11-27 16:01:03,222 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 800, loss[loss=0.07021, simple_loss=0.1025, pruned_loss=0.01167, audio_tagging_loss=0.00731, over 15896.00 frames. ], tot_loss[loss=0.06759, simple_loss=0.09106, pruned_loss=0.01299, audio_tagging_loss=0.009063, over 2981264.88 frames. ], batch size: 58, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:01:04,898 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.98 vs. limit=22.5 2023-11-27 16:01:19,996 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-27 16:01:24,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3131586.6666666665, ans=0.0 2023-11-27 16:01:26,321 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 469750 2023-11-27 16:01:29,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3131653.3333333335, ans=0.04949747468305833 2023-11-27 16:01:53,704 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3131786.6666666665, ans=0.2 2023-11-27 16:01:55,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3131786.6666666665, ans=0.0 2023-11-27 16:02:00,639 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 850, loss[loss=0.06992, simple_loss=0.09828, pruned_loss=0.01291, audio_tagging_loss=0.007863, over 14303.00 frames. ], tot_loss[loss=0.06779, simple_loss=0.09128, pruned_loss=0.01302, audio_tagging_loss=0.00913, over 2995975.79 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:02:22,680 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 469800 2023-11-27 16:02:29,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3131986.6666666665, ans=0.0 2023-11-27 16:02:31,533 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.270e+01 8.724e+01 9.421e+01 1.007e+02 1.369e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-27 16:02:44,870 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.37 vs. limit=10.0 2023-11-27 16:02:47,185 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3132120.0, ans=0.125 2023-11-27 16:02:57,918 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 900, loss[loss=0.03225, simple_loss=0.03475, pruned_loss=0.003039, audio_tagging_loss=0.01184, over 15004.00 frames. ], tot_loss[loss=0.06747, simple_loss=0.09101, pruned_loss=0.01285, audio_tagging_loss=0.009115, over 3008343.49 frames. ], batch size: 58, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:03:12,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3132253.3333333335, ans=0.0 2023-11-27 16:03:18,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3132253.3333333335, ans=0.07 2023-11-27 16:03:20,900 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 469850 2023-11-27 16:03:25,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3132320.0, ans=0.015 2023-11-27 16:03:25,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3132320.0, ans=0.025 2023-11-27 16:03:37,073 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.23 vs. limit=15.0 2023-11-27 16:03:55,285 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 950, loss[loss=0.06515, simple_loss=0.08674, pruned_loss=0.01565, audio_tagging_loss=0.006123, over 15109.00 frames. ], tot_loss[loss=0.06706, simple_loss=0.0905, pruned_loss=0.01276, audio_tagging_loss=0.009049, over 3017271.25 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:04:12,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3132586.6666666665, ans=0.125 2023-11-27 16:04:12,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3132586.6666666665, ans=0.1 2023-11-27 16:04:19,226 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 469900 2023-11-27 16:04:26,961 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.445e+01 8.703e+01 9.559e+01 1.057e+02 1.419e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-27 16:04:40,150 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.31 vs. limit=6.0 2023-11-27 16:04:53,155 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 1000, loss[loss=0.07679, simple_loss=0.112, pruned_loss=0.01628, audio_tagging_loss=0.00451, over 15627.00 frames. ], tot_loss[loss=0.06735, simple_loss=0.0912, pruned_loss=0.01283, audio_tagging_loss=0.008922, over 3027195.37 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:05:16,396 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 469950 2023-11-27 16:05:19,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3132986.6666666665, ans=0.2 2023-11-27 16:05:20,835 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 16:05:24,289 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.82 vs. limit=10.0 2023-11-27 16:05:41,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3133120.0, ans=0.1 2023-11-27 16:05:41,374 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-27 16:05:41,543 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.38 vs. limit=15.0 2023-11-27 16:05:44,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3133120.0, ans=0.0 2023-11-27 16:05:48,777 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.45 vs. limit=22.5 2023-11-27 16:05:50,498 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3133186.6666666665, ans=0.125 2023-11-27 16:05:51,391 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 1050, loss[loss=0.08804, simple_loss=0.1252, pruned_loss=0.01973, audio_tagging_loss=0.005725, over 16270.00 frames. ], tot_loss[loss=0.06708, simple_loss=0.09116, pruned_loss=0.01274, audio_tagging_loss=0.00876, over 3039907.17 frames. ], batch size: 58, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:05:51,596 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3133186.6666666665, ans=0.125 2023-11-27 16:06:06,248 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3133253.3333333335, ans=0.1 2023-11-27 16:06:08,877 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.37 vs. limit=15.0 2023-11-27 16:06:09,860 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3133253.3333333335, ans=0.0 2023-11-27 16:06:14,247 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 470000 2023-11-27 16:06:15,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3133320.0, ans=0.125 2023-11-27 16:06:22,107 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.431e+01 8.934e+01 9.738e+01 1.038e+02 1.396e+02, threshold=1.948e+02, percent-clipped=0.0 2023-11-27 16:06:29,093 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.71 vs. limit=22.5 2023-11-27 16:06:43,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3133453.3333333335, ans=0.0 2023-11-27 16:06:48,711 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 1100, loss[loss=0.06998, simple_loss=0.09744, pruned_loss=0.01535, audio_tagging_loss=0.005909, over 15719.00 frames. ], tot_loss[loss=0.06735, simple_loss=0.09172, pruned_loss=0.01284, audio_tagging_loss=0.008647, over 3041428.82 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:06:55,075 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 16:06:56,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3133520.0, ans=10.0 2023-11-27 16:06:58,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3133520.0, ans=0.025 2023-11-27 16:07:01,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3133586.6666666665, ans=0.0 2023-11-27 16:07:07,003 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3133586.6666666665, ans=0.125 2023-11-27 16:07:12,243 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 470050 2023-11-27 16:07:13,748 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.81 vs. limit=15.0 2023-11-27 16:07:18,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3133653.3333333335, ans=0.2 2023-11-27 16:07:23,227 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.58 vs. limit=15.0 2023-11-27 16:07:38,395 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.91 vs. limit=12.0 2023-11-27 16:07:46,815 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 1150, loss[loss=0.06148, simple_loss=0.07469, pruned_loss=0.01363, audio_tagging_loss=0.01051, over 14762.00 frames. ], tot_loss[loss=0.06724, simple_loss=0.09125, pruned_loss=0.0129, audio_tagging_loss=0.008716, over 3042991.60 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:07:54,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3133853.3333333335, ans=0.0 2023-11-27 16:07:59,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3133920.0, ans=0.2 2023-11-27 16:08:10,143 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 470100 2023-11-27 16:08:15,806 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3133986.6666666665, ans=0.125 2023-11-27 16:08:16,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3133986.6666666665, ans=0.125 2023-11-27 16:08:18,078 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.166e+01 8.606e+01 9.243e+01 9.874e+01 1.339e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-27 16:08:19,383 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3133986.6666666665, ans=0.0 2023-11-27 16:08:23,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3134053.3333333335, ans=0.125 2023-11-27 16:08:30,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3134053.3333333335, ans=0.125 2023-11-27 16:08:31,245 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.74 vs. limit=15.0 2023-11-27 16:08:40,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3134120.0, ans=0.125 2023-11-27 16:08:41,359 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3134120.0, ans=0.125 2023-11-27 16:08:43,628 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3134186.6666666665, ans=0.2 2023-11-27 16:08:44,536 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 1200, loss[loss=0.07618, simple_loss=0.1113, pruned_loss=0.01267, audio_tagging_loss=0.00788, over 16272.00 frames. ], tot_loss[loss=0.06678, simple_loss=0.09077, pruned_loss=0.01273, audio_tagging_loss=0.008662, over 3043364.22 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:08:45,913 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3134186.6666666665, ans=0.125 2023-11-27 16:09:08,374 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 470150 2023-11-27 16:09:21,650 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.24 vs. limit=10.0 2023-11-27 16:09:42,467 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 1250, loss[loss=0.06901, simple_loss=0.09783, pruned_loss=0.01134, audio_tagging_loss=0.008753, over 15774.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.0899, pruned_loss=0.0125, audio_tagging_loss=0.008665, over 3047653.48 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:09:43,157 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.58 vs. limit=15.0 2023-11-27 16:09:44,051 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.06 vs. limit=12.0 2023-11-27 16:09:50,039 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.03 vs. limit=15.0 2023-11-27 16:09:51,393 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3134520.0, ans=0.2 2023-11-27 16:10:05,551 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.20 vs. limit=10.0 2023-11-27 16:10:06,066 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 470200 2023-11-27 16:10:06,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3134653.3333333335, ans=0.0 2023-11-27 16:10:10,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3134653.3333333335, ans=0.95 2023-11-27 16:10:14,027 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.180e+01 8.573e+01 9.266e+01 9.865e+01 1.338e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-27 16:10:36,023 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.40 vs. limit=15.0 2023-11-27 16:10:40,914 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 1300, loss[loss=0.08356, simple_loss=0.1221, pruned_loss=0.01572, audio_tagging_loss=0.006791, over 14343.00 frames. ], tot_loss[loss=0.06699, simple_loss=0.09125, pruned_loss=0.01273, audio_tagging_loss=0.008645, over 3051515.51 frames. ], batch size: 53, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:10:48,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3134853.3333333335, ans=0.125 2023-11-27 16:11:03,373 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 470250 2023-11-27 16:11:38,498 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 1350, loss[loss=0.07507, simple_loss=0.1044, pruned_loss=0.01565, audio_tagging_loss=0.007246, over 16065.00 frames. ], tot_loss[loss=0.06676, simple_loss=0.09088, pruned_loss=0.0126, audio_tagging_loss=0.008718, over 3049807.43 frames. ], batch size: 58, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:12:01,840 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 470300 2023-11-27 16:12:09,982 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.120e+01 8.641e+01 9.240e+01 9.975e+01 1.416e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-27 16:12:21,705 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 16:12:23,890 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 16:12:24,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3135453.3333333335, ans=0.05 2023-11-27 16:12:24,579 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.37 vs. limit=15.0 2023-11-27 16:12:26,892 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.83 vs. limit=15.0 2023-11-27 16:12:29,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3135453.3333333335, ans=0.125 2023-11-27 16:12:32,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3135453.3333333335, ans=0.125 2023-11-27 16:12:36,565 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 1400, loss[loss=0.07903, simple_loss=0.1077, pruned_loss=0.0165, audio_tagging_loss=0.008667, over 16023.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.09066, pruned_loss=0.01275, audio_tagging_loss=0.008784, over 3054476.39 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:12:42,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3135520.0, ans=0.1 2023-11-27 16:12:57,758 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3135586.6666666665, ans=0.5 2023-11-27 16:12:58,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3135653.3333333335, ans=0.2 2023-11-27 16:12:59,831 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 470350 2023-11-27 16:13:17,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3135720.0, ans=10.0 2023-11-27 16:13:19,158 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3135720.0, ans=0.04949747468305833 2023-11-27 16:13:28,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3135786.6666666665, ans=0.0 2023-11-27 16:13:33,417 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.76 vs. limit=22.5 2023-11-27 16:13:34,925 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 1450, loss[loss=0.07046, simple_loss=0.0881, pruned_loss=0.01542, audio_tagging_loss=0.01099, over 15451.00 frames. ], tot_loss[loss=0.06695, simple_loss=0.09093, pruned_loss=0.01273, audio_tagging_loss=0.00875, over 3045745.38 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:13:43,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3135853.3333333335, ans=0.1 2023-11-27 16:13:47,094 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.29 vs. limit=15.0 2023-11-27 16:13:51,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3135920.0, ans=0.125 2023-11-27 16:13:51,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3135920.0, ans=0.125 2023-11-27 16:13:53,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3135920.0, ans=0.125 2023-11-27 16:13:57,720 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 470400 2023-11-27 16:14:05,671 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.278e+01 8.750e+01 9.513e+01 1.013e+02 1.289e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-27 16:14:16,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3136053.3333333335, ans=0.125 2023-11-27 16:14:28,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3136120.0, ans=0.125 2023-11-27 16:14:30,895 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3136120.0, ans=0.1 2023-11-27 16:14:32,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3136186.6666666665, ans=0.2 2023-11-27 16:14:32,818 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 1500, loss[loss=0.0617, simple_loss=0.08177, pruned_loss=0.01046, audio_tagging_loss=0.01036, over 14949.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.09021, pruned_loss=0.01266, audio_tagging_loss=0.008926, over 3048419.15 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:14:42,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3136253.3333333335, ans=0.125 2023-11-27 16:14:50,705 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3136253.3333333335, ans=0.0 2023-11-27 16:14:56,082 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 470450 2023-11-27 16:15:08,882 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3136386.6666666665, ans=0.125 2023-11-27 16:15:18,377 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3136453.3333333335, ans=0.125 2023-11-27 16:15:25,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3136453.3333333335, ans=0.125 2023-11-27 16:15:27,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3136453.3333333335, ans=0.125 2023-11-27 16:15:30,269 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 1550, loss[loss=0.05787, simple_loss=0.07232, pruned_loss=0.01071, audio_tagging_loss=0.011, over 15856.00 frames. ], tot_loss[loss=0.06711, simple_loss=0.09062, pruned_loss=0.01285, audio_tagging_loss=0.008948, over 3043599.45 frames. ], batch size: 62, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:15:40,097 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3136520.0, ans=0.125 2023-11-27 16:15:41,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3136586.6666666665, ans=0.1 2023-11-27 16:15:53,786 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 470500 2023-11-27 16:16:03,048 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.629e+01 8.701e+01 9.341e+01 1.017e+02 1.377e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-27 16:16:08,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3136720.0, ans=0.125 2023-11-27 16:16:10,914 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3136720.0, ans=0.125 2023-11-27 16:16:13,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3136720.0, ans=0.0 2023-11-27 16:16:16,778 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.93 vs. limit=15.0 2023-11-27 16:16:28,062 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 1600, loss[loss=0.07938, simple_loss=0.1074, pruned_loss=0.01789, audio_tagging_loss=0.00781, over 15104.00 frames. ], tot_loss[loss=0.06727, simple_loss=0.09087, pruned_loss=0.01285, audio_tagging_loss=0.00898, over 3045720.94 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:16:31,113 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=3136853.3333333335, ans=15.0 2023-11-27 16:16:50,944 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 470550 2023-11-27 16:16:52,353 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3136986.6666666665, ans=0.125 2023-11-27 16:17:06,109 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3137053.3333333335, ans=0.125 2023-11-27 16:17:19,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3137120.0, ans=0.0 2023-11-27 16:17:26,220 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 1650, loss[loss=0.06637, simple_loss=0.08452, pruned_loss=0.01366, audio_tagging_loss=0.01045, over 14094.00 frames. ], tot_loss[loss=0.06702, simple_loss=0.0905, pruned_loss=0.01281, audio_tagging_loss=0.008964, over 3047559.78 frames. ], batch size: 54, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:17:31,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3137186.6666666665, ans=0.2 2023-11-27 16:17:34,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3137186.6666666665, ans=0.2 2023-11-27 16:17:48,394 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 470600 2023-11-27 16:17:58,999 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.559e+01 8.945e+01 9.413e+01 1.026e+02 1.249e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-27 16:18:09,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3137386.6666666665, ans=0.125 2023-11-27 16:18:23,910 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 1700, loss[loss=0.08077, simple_loss=0.1055, pruned_loss=0.01882, audio_tagging_loss=0.009205, over 14768.00 frames. ], tot_loss[loss=0.0677, simple_loss=0.09134, pruned_loss=0.013, audio_tagging_loss=0.009027, over 3047256.02 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:18:31,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3137520.0, ans=0.125 2023-11-27 16:18:38,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3137586.6666666665, ans=0.2 2023-11-27 16:18:41,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3137586.6666666665, ans=0.125 2023-11-27 16:18:44,045 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3137586.6666666665, ans=0.125 2023-11-27 16:18:46,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3137653.3333333335, ans=0.125 2023-11-27 16:18:47,181 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 470650 2023-11-27 16:18:49,517 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3137653.3333333335, ans=0.1 2023-11-27 16:18:50,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3137653.3333333335, ans=0.07 2023-11-27 16:18:55,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3137653.3333333335, ans=0.125 2023-11-27 16:18:55,645 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3137653.3333333335, ans=0.125 2023-11-27 16:18:57,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3137720.0, ans=0.125 2023-11-27 16:19:00,035 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3137720.0, ans=0.125 2023-11-27 16:19:21,617 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 1750, loss[loss=0.07008, simple_loss=0.09344, pruned_loss=0.01554, audio_tagging_loss=0.007823, over 15410.00 frames. ], tot_loss[loss=0.06707, simple_loss=0.09052, pruned_loss=0.01279, audio_tagging_loss=0.009021, over 3039336.61 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:19:32,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3137920.0, ans=0.0 2023-11-27 16:19:32,573 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.72 vs. limit=22.5 2023-11-27 16:19:36,334 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.18 vs. limit=15.0 2023-11-27 16:19:40,994 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.98 vs. limit=15.0 2023-11-27 16:19:41,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3137920.0, ans=0.125 2023-11-27 16:19:43,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3137986.6666666665, ans=0.0 2023-11-27 16:19:44,799 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 470700 2023-11-27 16:19:54,554 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.471e+01 8.663e+01 9.086e+01 9.649e+01 1.198e+02, threshold=1.817e+02, percent-clipped=0.0 2023-11-27 16:19:58,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3138053.3333333335, ans=0.125 2023-11-27 16:20:06,827 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3138120.0, ans=0.125 2023-11-27 16:20:19,436 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 1800, loss[loss=0.09243, simple_loss=0.1218, pruned_loss=0.02075, audio_tagging_loss=0.01079, over 14652.00 frames. ], tot_loss[loss=0.06691, simple_loss=0.09067, pruned_loss=0.01269, audio_tagging_loss=0.008882, over 3028626.26 frames. ], batch size: 54, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:20:21,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3138186.6666666665, ans=0.0 2023-11-27 16:20:33,785 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.15 vs. limit=15.0 2023-11-27 16:20:41,853 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 470750 2023-11-27 16:20:42,316 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.41 vs. limit=22.5 2023-11-27 16:20:47,227 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.04 vs. limit=10.0 2023-11-27 16:20:50,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3138320.0, ans=0.0 2023-11-27 16:20:51,143 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.23 vs. limit=15.0 2023-11-27 16:20:52,827 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3138386.6666666665, ans=0.1 2023-11-27 16:21:09,358 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3138453.3333333335, ans=0.125 2023-11-27 16:21:15,945 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3138520.0, ans=0.1 2023-11-27 16:21:16,708 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 1850, loss[loss=0.05393, simple_loss=0.07185, pruned_loss=0.007949, audio_tagging_loss=0.01005, over 15006.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.09037, pruned_loss=0.01251, audio_tagging_loss=0.008828, over 3032721.74 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:21:20,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3138520.0, ans=0.0 2023-11-27 16:21:40,223 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 470800 2023-11-27 16:21:50,843 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.385e+01 8.620e+01 9.187e+01 9.919e+01 1.245e+02, threshold=1.837e+02, percent-clipped=0.0 2023-11-27 16:21:51,113 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3138720.0, ans=0.125 2023-11-27 16:21:53,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3138720.0, ans=0.0 2023-11-27 16:21:54,592 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3138720.0, ans=0.125 2023-11-27 16:22:01,158 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3138720.0, ans=0.125 2023-11-27 16:22:06,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3138786.6666666665, ans=0.125 2023-11-27 16:22:14,810 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 1900, loss[loss=0.06056, simple_loss=0.07792, pruned_loss=0.01362, audio_tagging_loss=0.007973, over 14547.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.09016, pruned_loss=0.01242, audio_tagging_loss=0.008759, over 3031501.97 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:22:23,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3138853.3333333335, ans=0.1 2023-11-27 16:22:38,590 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 470850 2023-11-27 16:22:45,608 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.95 vs. limit=10.0 2023-11-27 16:22:55,220 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 16:22:58,999 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 16:23:12,364 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.66 vs. limit=22.5 2023-11-27 16:23:12,692 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 1950, loss[loss=0.05683, simple_loss=0.07715, pruned_loss=0.009905, audio_tagging_loss=0.008353, over 15107.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.09008, pruned_loss=0.01247, audio_tagging_loss=0.008763, over 3034479.11 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:23:22,715 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.08 vs. limit=22.5 2023-11-27 16:23:27,470 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.04 vs. limit=15.0 2023-11-27 16:23:33,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3139253.3333333335, ans=0.125 2023-11-27 16:23:35,697 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 470900 2023-11-27 16:23:44,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3139320.0, ans=0.2 2023-11-27 16:23:46,452 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.170e+01 8.576e+01 9.160e+01 9.774e+01 1.352e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-27 16:23:46,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3139386.6666666665, ans=0.0 2023-11-27 16:23:49,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3139386.6666666665, ans=0.125 2023-11-27 16:23:51,009 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3139386.6666666665, ans=0.1 2023-11-27 16:24:10,856 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 2000, loss[loss=0.05426, simple_loss=0.07688, pruned_loss=0.009481, audio_tagging_loss=0.006337, over 14646.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.08926, pruned_loss=0.01234, audio_tagging_loss=0.008795, over 3025663.29 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:24:26,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3139586.6666666665, ans=0.2 2023-11-27 16:24:33,690 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 470950 2023-11-27 16:24:34,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3139653.3333333335, ans=0.0 2023-11-27 16:24:57,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3139786.6666666665, ans=0.04949747468305833 2023-11-27 16:24:59,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3139786.6666666665, ans=0.0 2023-11-27 16:25:02,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3139786.6666666665, ans=0.1 2023-11-27 16:25:07,720 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 2050, loss[loss=0.06306, simple_loss=0.08786, pruned_loss=0.01129, audio_tagging_loss=0.007841, over 15274.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.08948, pruned_loss=0.01225, audio_tagging_loss=0.00879, over 3026448.75 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:25:17,695 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3139853.3333333335, ans=0.125 2023-11-27 16:25:31,917 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 471000 2023-11-27 16:25:38,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3139986.6666666665, ans=0.1 2023-11-27 16:25:41,988 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.142e+01 8.831e+01 9.330e+01 1.029e+02 1.229e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-27 16:25:42,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3140053.3333333335, ans=0.2 2023-11-27 16:25:45,600 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3140053.3333333335, ans=0.125 2023-11-27 16:25:56,340 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 16:25:58,348 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3140120.0, ans=0.0 2023-11-27 16:26:05,849 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 2100, loss[loss=0.08355, simple_loss=0.1148, pruned_loss=0.0177, audio_tagging_loss=0.008465, over 14055.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.08994, pruned_loss=0.01256, audio_tagging_loss=0.008807, over 3024414.33 frames. ], batch size: 52, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:26:06,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3140186.6666666665, ans=0.1 2023-11-27 16:26:13,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3140186.6666666665, ans=0.125 2023-11-27 16:26:16,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3140186.6666666665, ans=0.125 2023-11-27 16:26:21,693 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3140253.3333333335, ans=0.0 2023-11-27 16:26:29,093 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 471050 2023-11-27 16:26:31,873 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.45 vs. limit=15.0 2023-11-27 16:26:57,271 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3140453.3333333335, ans=0.1 2023-11-27 16:27:03,649 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 2150, loss[loss=0.068, simple_loss=0.09188, pruned_loss=0.01288, audio_tagging_loss=0.009182, over 15578.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.08938, pruned_loss=0.01237, audio_tagging_loss=0.008801, over 3020585.63 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:27:27,032 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 471100 2023-11-27 16:27:36,628 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.380e+01 8.778e+01 9.339e+01 1.008e+02 1.312e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-27 16:27:42,465 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 16:28:01,139 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 2200, loss[loss=0.06376, simple_loss=0.08038, pruned_loss=0.01277, audio_tagging_loss=0.0108, over 15267.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.0903, pruned_loss=0.01248, audio_tagging_loss=0.008756, over 3032422.35 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:28:01,870 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.70 vs. limit=15.0 2023-11-27 16:28:02,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3140853.3333333335, ans=0.0 2023-11-27 16:28:24,814 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 471150 2023-11-27 16:28:36,386 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3141053.3333333335, ans=0.0 2023-11-27 16:28:45,201 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3141053.3333333335, ans=0.1 2023-11-27 16:28:48,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3141120.0, ans=0.125 2023-11-27 16:28:50,458 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.44 vs. limit=15.0 2023-11-27 16:28:53,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3141120.0, ans=0.125 2023-11-27 16:28:57,012 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3141120.0, ans=0.125 2023-11-27 16:28:58,790 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 2250, loss[loss=0.09026, simple_loss=0.126, pruned_loss=0.02098, audio_tagging_loss=0.006266, over 14650.00 frames. ], tot_loss[loss=0.06701, simple_loss=0.09115, pruned_loss=0.01272, audio_tagging_loss=0.008722, over 3036488.16 frames. ], batch size: 54, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:29:15,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3141253.3333333335, ans=0.025 2023-11-27 16:29:21,761 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 471200 2023-11-27 16:29:32,652 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.10 vs. limit=15.0 2023-11-27 16:29:33,590 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.247e+01 8.710e+01 9.342e+01 1.003e+02 1.212e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-27 16:29:33,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3141386.6666666665, ans=0.0 2023-11-27 16:29:47,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3141453.3333333335, ans=0.015 2023-11-27 16:29:49,067 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.42 vs. limit=6.0 2023-11-27 16:29:51,554 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3141453.3333333335, ans=0.2 2023-11-27 16:29:57,618 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 2300, loss[loss=0.09253, simple_loss=0.1319, pruned_loss=0.01872, audio_tagging_loss=0.007884, over 15999.00 frames. ], tot_loss[loss=0.06757, simple_loss=0.09177, pruned_loss=0.01283, audio_tagging_loss=0.008852, over 3036227.09 frames. ], batch size: 60, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:30:07,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3141586.6666666665, ans=0.2 2023-11-27 16:30:19,825 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 471250 2023-11-27 16:30:43,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3141786.6666666665, ans=0.125 2023-11-27 16:30:44,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3141786.6666666665, ans=0.2 2023-11-27 16:30:45,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3141786.6666666665, ans=0.0 2023-11-27 16:30:50,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3141786.6666666665, ans=0.125 2023-11-27 16:30:51,136 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 16:30:54,418 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 2350, loss[loss=0.05511, simple_loss=0.07614, pruned_loss=0.00844, audio_tagging_loss=0.008603, over 14768.00 frames. ], tot_loss[loss=0.0677, simple_loss=0.09182, pruned_loss=0.01294, audio_tagging_loss=0.008851, over 3040766.13 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:30:55,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3141853.3333333335, ans=0.1 2023-11-27 16:30:56,868 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3141853.3333333335, ans=0.0 2023-11-27 16:31:09,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3141920.0, ans=0.1 2023-11-27 16:31:14,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3141920.0, ans=0.0 2023-11-27 16:31:18,118 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 471300 2023-11-27 16:31:24,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3141986.6666666665, ans=0.2 2023-11-27 16:31:26,307 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.87 vs. limit=15.0 2023-11-27 16:31:29,672 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.405e+01 8.596e+01 9.412e+01 1.004e+02 1.275e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-27 16:31:32,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3142053.3333333335, ans=0.125 2023-11-27 16:31:35,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3142053.3333333335, ans=0.2 2023-11-27 16:31:47,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3142120.0, ans=0.0 2023-11-27 16:31:52,510 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 2400, loss[loss=0.08035, simple_loss=0.1149, pruned_loss=0.01404, audio_tagging_loss=0.008859, over 15769.00 frames. ], tot_loss[loss=0.06679, simple_loss=0.09047, pruned_loss=0.01258, audio_tagging_loss=0.008972, over 3040496.66 frames. ], batch size: 58, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:32:04,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3142253.3333333335, ans=0.125 2023-11-27 16:32:08,213 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.05 vs. limit=15.0 2023-11-27 16:32:15,886 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 471350 2023-11-27 16:32:25,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3142320.0, ans=0.1 2023-11-27 16:32:41,050 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.11 vs. limit=6.0 2023-11-27 16:32:50,644 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 2450, loss[loss=0.07439, simple_loss=0.09866, pruned_loss=0.01423, audio_tagging_loss=0.01082, over 15378.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.09013, pruned_loss=0.0124, audio_tagging_loss=0.009077, over 3041428.38 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:33:01,846 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=3142586.6666666665, ans=15.0 2023-11-27 16:33:10,763 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.76 vs. limit=15.0 2023-11-27 16:33:13,839 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 471400 2023-11-27 16:33:16,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3142653.3333333335, ans=0.0 2023-11-27 16:33:25,414 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.537e+01 8.627e+01 9.278e+01 9.943e+01 1.246e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-27 16:33:25,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3142720.0, ans=0.1 2023-11-27 16:33:29,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3142720.0, ans=0.0 2023-11-27 16:33:34,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3142720.0, ans=0.05 2023-11-27 16:33:39,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3142786.6666666665, ans=0.125 2023-11-27 16:33:40,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3142786.6666666665, ans=0.2 2023-11-27 16:33:48,568 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 2500, loss[loss=0.09144, simple_loss=0.1226, pruned_loss=0.02331, audio_tagging_loss=0.006835, over 15114.00 frames. ], tot_loss[loss=0.06709, simple_loss=0.09057, pruned_loss=0.01268, audio_tagging_loss=0.009121, over 3044416.28 frames. ], batch size: 53, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:33:52,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3142853.3333333335, ans=0.125 2023-11-27 16:34:01,001 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.17 vs. limit=15.0 2023-11-27 16:34:09,437 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3142920.0, ans=0.2 2023-11-27 16:34:09,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3142920.0, ans=0.125 2023-11-27 16:34:11,500 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 471450 2023-11-27 16:34:15,562 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3142986.6666666665, ans=0.0 2023-11-27 16:34:25,840 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.47 vs. limit=15.0 2023-11-27 16:34:46,512 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 2550, loss[loss=0.07534, simple_loss=0.1016, pruned_loss=0.01672, audio_tagging_loss=0.007823, over 15524.00 frames. ], tot_loss[loss=0.06701, simple_loss=0.09058, pruned_loss=0.01268, audio_tagging_loss=0.009034, over 3046577.60 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:34:50,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3143186.6666666665, ans=0.125 2023-11-27 16:35:09,344 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 471500 2023-11-27 16:35:21,965 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.304e+01 8.534e+01 9.124e+01 9.895e+01 1.510e+02, threshold=1.825e+02, percent-clipped=0.0 2023-11-27 16:35:22,646 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.79 vs. limit=15.0 2023-11-27 16:35:44,627 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 2600, loss[loss=0.06193, simple_loss=0.08783, pruned_loss=0.009704, audio_tagging_loss=0.008307, over 15240.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.09021, pruned_loss=0.01256, audio_tagging_loss=0.008884, over 3048134.46 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:35:50,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3143520.0, ans=0.125 2023-11-27 16:36:07,269 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 471550 2023-11-27 16:36:20,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3143720.0, ans=0.1 2023-11-27 16:36:35,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3143786.6666666665, ans=0.0 2023-11-27 16:36:36,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3143786.6666666665, ans=0.07 2023-11-27 16:36:41,777 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 2650, loss[loss=0.08313, simple_loss=0.1171, pruned_loss=0.01883, audio_tagging_loss=0.005731, over 17213.00 frames. ], tot_loss[loss=0.06676, simple_loss=0.09055, pruned_loss=0.01276, audio_tagging_loss=0.008719, over 3052230.42 frames. ], batch size: 65, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:36:43,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3143853.3333333335, ans=0.125 2023-11-27 16:37:05,536 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 471600 2023-11-27 16:37:11,512 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.73 vs. limit=12.0 2023-11-27 16:37:16,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3144053.3333333335, ans=0.2 2023-11-27 16:37:18,561 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.734e+01 8.789e+01 9.225e+01 1.011e+02 1.898e+02, threshold=1.845e+02, percent-clipped=1.0 2023-11-27 16:37:28,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3144120.0, ans=0.0 2023-11-27 16:37:41,113 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 2700, loss[loss=0.06586, simple_loss=0.08356, pruned_loss=0.01406, audio_tagging_loss=0.01002, over 14664.00 frames. ], tot_loss[loss=0.06676, simple_loss=0.09026, pruned_loss=0.01288, audio_tagging_loss=0.008749, over 3053616.62 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:37:44,569 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3144186.6666666665, ans=0.0 2023-11-27 16:38:00,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3144253.3333333335, ans=0.125 2023-11-27 16:38:03,521 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 471650 2023-11-27 16:38:23,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3144386.6666666665, ans=0.125 2023-11-27 16:38:38,653 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 2750, loss[loss=0.03973, simple_loss=0.05258, pruned_loss=0.005606, audio_tagging_loss=0.007831, over 15375.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.08976, pruned_loss=0.0128, audio_tagging_loss=0.008739, over 3049203.87 frames. ], batch size: 60, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:38:43,539 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.25 vs. limit=15.0 2023-11-27 16:38:53,651 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.93 vs. limit=15.0 2023-11-27 16:39:00,731 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 471700 2023-11-27 16:39:14,191 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.918e+01 8.654e+01 9.307e+01 9.925e+01 1.318e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-27 16:39:26,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3144786.6666666665, ans=0.0 2023-11-27 16:39:31,258 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 16:39:35,724 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 2800, loss[loss=0.04275, simple_loss=0.05942, pruned_loss=0.005101, audio_tagging_loss=0.007936, over 13888.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.08999, pruned_loss=0.01276, audio_tagging_loss=0.008782, over 3049844.36 frames. ], batch size: 54, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:39:44,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3144853.3333333335, ans=0.0 2023-11-27 16:39:51,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3144920.0, ans=0.05 2023-11-27 16:39:59,061 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 471750 2023-11-27 16:40:02,525 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3144986.6666666665, ans=0.0 2023-11-27 16:40:21,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3145120.0, ans=0.0 2023-11-27 16:40:33,063 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 2850, loss[loss=0.09167, simple_loss=0.1257, pruned_loss=0.02013, audio_tagging_loss=0.00867, over 15023.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.09013, pruned_loss=0.01268, audio_tagging_loss=0.008803, over 3049825.41 frames. ], batch size: 54, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:40:37,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3145186.6666666665, ans=0.125 2023-11-27 16:40:42,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3145186.6666666665, ans=0.0 2023-11-27 16:40:46,766 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.79 vs. limit=6.0 2023-11-27 16:40:51,873 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.10 vs. limit=15.0 2023-11-27 16:40:56,810 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 471800 2023-11-27 16:41:09,593 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.09 vs. limit=15.0 2023-11-27 16:41:10,185 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.450e+01 8.689e+01 9.342e+01 1.021e+02 1.296e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-27 16:41:31,285 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 2900, loss[loss=0.07514, simple_loss=0.1045, pruned_loss=0.01593, audio_tagging_loss=0.006972, over 15458.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.08967, pruned_loss=0.01267, audio_tagging_loss=0.008788, over 3051374.12 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:41:38,014 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.22 vs. limit=15.0 2023-11-27 16:41:43,234 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-27 16:41:45,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3145586.6666666665, ans=0.0 2023-11-27 16:41:46,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3145586.6666666665, ans=0.1 2023-11-27 16:41:53,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3145653.3333333335, ans=0.125 2023-11-27 16:41:53,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3145653.3333333335, ans=0.125 2023-11-27 16:41:54,047 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 471850 2023-11-27 16:42:04,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3145720.0, ans=0.125 2023-11-27 16:42:28,714 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 2950, loss[loss=0.05894, simple_loss=0.07991, pruned_loss=0.0111, audio_tagging_loss=0.007883, over 15446.00 frames. ], tot_loss[loss=0.06689, simple_loss=0.09058, pruned_loss=0.01285, audio_tagging_loss=0.00875, over 3042087.68 frames. ], batch size: 58, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:42:52,192 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 471900 2023-11-27 16:43:00,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3145986.6666666665, ans=0.2 2023-11-27 16:43:05,774 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.312e+01 8.644e+01 9.313e+01 9.896e+01 1.371e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-27 16:43:18,474 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.95 vs. limit=15.0 2023-11-27 16:43:25,562 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 3000, loss[loss=0.08067, simple_loss=0.1092, pruned_loss=0.01577, audio_tagging_loss=0.01032, over 15202.00 frames. ], tot_loss[loss=0.06679, simple_loss=0.09068, pruned_loss=0.01265, audio_tagging_loss=0.008805, over 3048686.51 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:43:25,563 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-27 16:43:45,576 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.8286, 4.9740, 5.0857, 4.9290], device='cuda:1') 2023-11-27 16:44:00,563 INFO [train_asr.py:1267] (1/4) Epoch 40, validation: loss=0.0576, simple_loss=0.0507, pruned_loss=0.005183, audio_tagging_loss=0.02707, over 4681554.00 frames. 2023-11-27 16:44:00,564 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-27 16:44:01,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3146186.6666666665, ans=0.0 2023-11-27 16:44:18,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3146253.3333333335, ans=0.0 2023-11-27 16:44:22,687 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 471950 2023-11-27 16:44:53,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3146453.3333333335, ans=0.125 2023-11-27 16:44:56,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3146520.0, ans=0.025 2023-11-27 16:44:57,630 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 3050, loss[loss=0.0682, simple_loss=0.09541, pruned_loss=0.01156, audio_tagging_loss=0.00893, over 15766.00 frames. ], tot_loss[loss=0.06717, simple_loss=0.0912, pruned_loss=0.01274, audio_tagging_loss=0.008823, over 3053152.95 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:45:07,991 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.92 vs. limit=15.0 2023-11-27 16:45:10,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3146586.6666666665, ans=0.125 2023-11-27 16:45:15,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3146586.6666666665, ans=0.1 2023-11-27 16:45:18,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3146586.6666666665, ans=0.0 2023-11-27 16:45:20,566 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 472000 2023-11-27 16:45:25,263 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3146653.3333333335, ans=0.1 2023-11-27 16:45:31,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3146653.3333333335, ans=0.0 2023-11-27 16:45:37,418 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.576e+01 8.960e+01 9.776e+01 1.063e+02 1.311e+02, threshold=1.955e+02, percent-clipped=0.0 2023-11-27 16:45:37,501 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 16:45:57,822 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 3100, loss[loss=0.05369, simple_loss=0.0684, pruned_loss=0.0107, audio_tagging_loss=0.00879, over 14962.00 frames. ], tot_loss[loss=0.06657, simple_loss=0.09011, pruned_loss=0.0126, audio_tagging_loss=0.008918, over 3057281.91 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:46:01,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3146853.3333333335, ans=0.0 2023-11-27 16:46:05,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3146853.3333333335, ans=0.125 2023-11-27 16:46:16,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3146920.0, ans=0.125 2023-11-27 16:46:21,468 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 472050 2023-11-27 16:46:33,881 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.62 vs. limit=12.0 2023-11-27 16:46:53,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3147120.0, ans=0.0 2023-11-27 16:46:55,367 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 3150, loss[loss=0.07069, simple_loss=0.08578, pruned_loss=0.01524, audio_tagging_loss=0.01257, over 15390.00 frames. ], tot_loss[loss=0.06713, simple_loss=0.0909, pruned_loss=0.01275, audio_tagging_loss=0.008929, over 3046368.38 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:46:56,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3147186.6666666665, ans=0.0 2023-11-27 16:46:59,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3147186.6666666665, ans=0.2 2023-11-27 16:47:02,362 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3147186.6666666665, ans=0.1 2023-11-27 16:47:18,652 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 472100 2023-11-27 16:47:30,595 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3147386.6666666665, ans=0.1 2023-11-27 16:47:32,434 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.379e+01 8.709e+01 9.238e+01 9.861e+01 1.387e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-27 16:47:51,032 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.13 vs. limit=15.0 2023-11-27 16:47:52,701 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 3200, loss[loss=0.05767, simple_loss=0.07786, pruned_loss=0.009739, audio_tagging_loss=0.009002, over 15113.00 frames. ], tot_loss[loss=0.06705, simple_loss=0.0906, pruned_loss=0.0127, audio_tagging_loss=0.009046, over 3055277.31 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:48:15,824 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 472150 2023-11-27 16:48:20,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3147653.3333333335, ans=0.2 2023-11-27 16:48:50,363 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 3250, loss[loss=0.0688, simple_loss=0.09502, pruned_loss=0.01248, audio_tagging_loss=0.008816, over 15139.00 frames. ], tot_loss[loss=0.06721, simple_loss=0.09095, pruned_loss=0.01268, audio_tagging_loss=0.009052, over 3054026.84 frames. ], batch size: 58, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:48:56,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3147853.3333333335, ans=0.1 2023-11-27 16:49:04,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3147920.0, ans=0.0 2023-11-27 16:49:08,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.whiten.whitening_limit, batch_count=3147920.0, ans=15.0 2023-11-27 16:49:10,442 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3147920.0, ans=0.125 2023-11-27 16:49:12,614 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3147986.6666666665, ans=0.125 2023-11-27 16:49:14,160 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 472200 2023-11-27 16:49:28,697 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.334e+01 8.717e+01 9.369e+01 9.960e+01 1.192e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-27 16:49:44,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3148120.0, ans=0.025 2023-11-27 16:49:46,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3148120.0, ans=0.2 2023-11-27 16:49:48,408 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 3300, loss[loss=0.06625, simple_loss=0.09068, pruned_loss=0.01268, audio_tagging_loss=0.008222, over 15026.00 frames. ], tot_loss[loss=0.06765, simple_loss=0.09185, pruned_loss=0.01268, audio_tagging_loss=0.009039, over 3055976.16 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:49:53,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3148186.6666666665, ans=0.0 2023-11-27 16:50:11,736 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 472250 2023-11-27 16:50:14,003 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3148320.0, ans=0.0 2023-11-27 16:50:46,544 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 3350, loss[loss=0.06861, simple_loss=0.09126, pruned_loss=0.01513, audio_tagging_loss=0.007858, over 14935.00 frames. ], tot_loss[loss=0.06738, simple_loss=0.09139, pruned_loss=0.01269, audio_tagging_loss=0.008995, over 3055863.85 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:50:52,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3148520.0, ans=0.0 2023-11-27 16:50:54,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff2.min_abs, batch_count=3148520.0, ans=0.1 2023-11-27 16:51:09,656 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 472300 2023-11-27 16:51:11,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3148653.3333333335, ans=0.0 2023-11-27 16:51:16,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3148653.3333333335, ans=0.09899494936611666 2023-11-27 16:51:21,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3148720.0, ans=0.125 2023-11-27 16:51:24,402 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.423e+01 8.635e+01 9.292e+01 9.708e+01 1.105e+02, threshold=1.858e+02, percent-clipped=0.0 2023-11-27 16:51:25,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3148720.0, ans=0.0 2023-11-27 16:51:43,870 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 3400, loss[loss=0.07246, simple_loss=0.09442, pruned_loss=0.0163, audio_tagging_loss=0.008952, over 15450.00 frames. ], tot_loss[loss=0.0672, simple_loss=0.09132, pruned_loss=0.01264, audio_tagging_loss=0.008893, over 3054623.72 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:51:47,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3148853.3333333335, ans=0.07 2023-11-27 16:52:00,499 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.46 vs. limit=22.5 2023-11-27 16:52:07,221 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 472350 2023-11-27 16:52:38,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=3149120.0, ans=15.0 2023-11-27 16:52:41,867 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 3450, loss[loss=0.07355, simple_loss=0.1056, pruned_loss=0.01092, audio_tagging_loss=0.009829, over 15425.00 frames. ], tot_loss[loss=0.06746, simple_loss=0.09169, pruned_loss=0.01279, audio_tagging_loss=0.008824, over 3051042.46 frames. ], batch size: 58, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:52:52,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3149253.3333333335, ans=0.0 2023-11-27 16:53:03,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3149253.3333333335, ans=0.0 2023-11-27 16:53:05,298 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 472400 2023-11-27 16:53:18,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3149386.6666666665, ans=0.2 2023-11-27 16:53:20,467 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.632e+01 8.772e+01 9.450e+01 1.013e+02 1.492e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-27 16:53:24,035 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3149386.6666666665, ans=0.1 2023-11-27 16:53:33,882 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.34 vs. limit=15.0 2023-11-27 16:53:39,338 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.57 vs. limit=15.0 2023-11-27 16:53:39,855 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 3500, loss[loss=0.06491, simple_loss=0.07765, pruned_loss=0.01457, audio_tagging_loss=0.01152, over 14648.00 frames. ], tot_loss[loss=0.06691, simple_loss=0.09055, pruned_loss=0.01281, audio_tagging_loss=0.008818, over 3049513.26 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:54:02,399 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.92 vs. limit=15.0 2023-11-27 16:54:03,469 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 472450 2023-11-27 16:54:13,267 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 16:54:26,236 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.36 vs. limit=15.0 2023-11-27 16:54:37,434 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 3550, loss[loss=0.05493, simple_loss=0.07578, pruned_loss=0.00884, audio_tagging_loss=0.008205, over 14525.00 frames. ], tot_loss[loss=0.06682, simple_loss=0.0902, pruned_loss=0.01291, audio_tagging_loss=0.008806, over 3046062.43 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 16:54:58,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3149920.0, ans=0.0 2023-11-27 16:54:59,692 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 16:55:00,008 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.78 vs. limit=15.0 2023-11-27 16:55:00,569 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 472500 2023-11-27 16:55:15,972 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.398e+01 8.604e+01 9.006e+01 9.735e+01 1.232e+02, threshold=1.801e+02, percent-clipped=0.0 2023-11-27 16:55:35,507 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 3600, loss[loss=0.08721, simple_loss=0.1097, pruned_loss=0.02373, audio_tagging_loss=0.008649, over 15393.00 frames. ], tot_loss[loss=0.0666, simple_loss=0.09, pruned_loss=0.0128, audio_tagging_loss=0.008798, over 3049547.01 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 16:55:44,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3150186.6666666665, ans=0.09899494936611666 2023-11-27 16:55:47,630 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.49 vs. limit=15.0 2023-11-27 16:55:55,047 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3150253.3333333335, ans=0.125 2023-11-27 16:55:58,161 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 472550 2023-11-27 16:56:12,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3150386.6666666665, ans=0.07 2023-11-27 16:56:12,987 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.74 vs. limit=6.0 2023-11-27 16:56:19,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3150386.6666666665, ans=0.125 2023-11-27 16:56:33,330 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 3650, loss[loss=0.05386, simple_loss=0.06528, pruned_loss=0.0103, audio_tagging_loss=0.01091, over 14714.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.08955, pruned_loss=0.01263, audio_tagging_loss=0.008729, over 3043102.94 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 16:56:36,993 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3150520.0, ans=0.125 2023-11-27 16:56:53,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3150586.6666666665, ans=0.0 2023-11-27 16:56:56,743 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 472600 2023-11-27 16:57:11,659 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.156e+01 8.799e+01 9.366e+01 9.854e+01 1.150e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-27 16:57:28,664 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 16:57:30,584 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 3700, loss[loss=0.04651, simple_loss=0.05335, pruned_loss=0.006531, audio_tagging_loss=0.01331, over 14374.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.08985, pruned_loss=0.01272, audio_tagging_loss=0.008745, over 3043343.17 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 16:57:32,394 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3150853.3333333335, ans=0.035 2023-11-27 16:57:53,943 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 472650 2023-11-27 16:57:55,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3150986.6666666665, ans=0.125 2023-11-27 16:57:58,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3150986.6666666665, ans=0.125 2023-11-27 16:58:00,361 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.42 vs. limit=15.0 2023-11-27 16:58:09,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3151053.3333333335, ans=0.125 2023-11-27 16:58:10,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3151053.3333333335, ans=0.0 2023-11-27 16:58:15,624 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.09 vs. limit=12.0 2023-11-27 16:58:20,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3151120.0, ans=0.125 2023-11-27 16:58:28,663 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 3750, loss[loss=0.05492, simple_loss=0.0722, pruned_loss=0.009227, audio_tagging_loss=0.009593, over 15110.00 frames. ], tot_loss[loss=0.06671, simple_loss=0.09036, pruned_loss=0.01277, audio_tagging_loss=0.008764, over 3051000.11 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 16:58:33,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3151186.6666666665, ans=0.1 2023-11-27 16:58:36,072 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.58 vs. limit=15.0 2023-11-27 16:58:39,393 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.44 vs. limit=22.5 2023-11-27 16:58:48,248 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 16:58:51,254 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 472700 2023-11-27 16:58:54,806 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3151320.0, ans=10.0 2023-11-27 16:59:07,640 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.433e+01 8.997e+01 9.607e+01 1.030e+02 1.522e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-27 16:59:12,469 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 16:59:26,481 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 3800, loss[loss=0.06476, simple_loss=0.07993, pruned_loss=0.01435, audio_tagging_loss=0.01045, over 15773.00 frames. ], tot_loss[loss=0.06693, simple_loss=0.09065, pruned_loss=0.01271, audio_tagging_loss=0.00889, over 3061142.31 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 16:59:35,585 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3151520.0, ans=0.0 2023-11-27 16:59:41,069 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 16:59:45,986 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.11 vs. limit=22.5 2023-11-27 16:59:49,625 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 472750 2023-11-27 16:59:55,990 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.54 vs. limit=22.5 2023-11-27 16:59:57,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3151653.3333333335, ans=0.0 2023-11-27 17:00:11,488 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3151786.6666666665, ans=0.1 2023-11-27 17:00:23,108 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 3850, loss[loss=0.04947, simple_loss=0.06013, pruned_loss=0.008765, audio_tagging_loss=0.01064, over 14500.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.08975, pruned_loss=0.01249, audio_tagging_loss=0.008973, over 3056257.66 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:00:36,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3151920.0, ans=0.125 2023-11-27 17:00:46,459 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 472800 2023-11-27 17:01:02,648 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.833e+01 8.917e+01 9.426e+01 9.996e+01 1.241e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-27 17:01:06,309 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3152053.3333333335, ans=0.1 2023-11-27 17:01:11,078 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.96 vs. limit=15.0 2023-11-27 17:01:20,604 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3152186.6666666665, ans=0.125 2023-11-27 17:01:21,848 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 3900, loss[loss=0.05578, simple_loss=0.0784, pruned_loss=0.007648, audio_tagging_loss=0.008929, over 14468.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.08956, pruned_loss=0.01256, audio_tagging_loss=0.00902, over 3050779.11 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:01:23,561 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.24 vs. limit=15.0 2023-11-27 17:01:44,299 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 472850 2023-11-27 17:01:45,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3152320.0, ans=0.125 2023-11-27 17:01:46,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3152320.0, ans=0.0 2023-11-27 17:01:58,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3152386.6666666665, ans=0.0 2023-11-27 17:02:12,488 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3152453.3333333335, ans=0.0 2023-11-27 17:02:18,846 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 3950, loss[loss=0.07004, simple_loss=0.09804, pruned_loss=0.01147, audio_tagging_loss=0.009549, over 15417.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.08933, pruned_loss=0.0125, audio_tagging_loss=0.009077, over 3048588.67 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:02:25,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3152520.0, ans=0.2 2023-11-27 17:02:25,109 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3152520.0, ans=0.125 2023-11-27 17:02:26,290 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3152520.0, ans=0.125 2023-11-27 17:02:27,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3152520.0, ans=0.2 2023-11-27 17:02:41,802 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 472900 2023-11-27 17:02:45,826 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.66 vs. limit=15.0 2023-11-27 17:02:58,015 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.297e+01 8.787e+01 9.336e+01 1.021e+02 1.304e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-27 17:02:59,894 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.24 vs. limit=15.0 2023-11-27 17:03:09,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3152786.6666666665, ans=0.2 2023-11-27 17:03:16,287 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 4000, loss[loss=0.06848, simple_loss=0.08954, pruned_loss=0.01331, audio_tagging_loss=0.01039, over 14034.00 frames. ], tot_loss[loss=0.06674, simple_loss=0.09005, pruned_loss=0.01264, audio_tagging_loss=0.00907, over 3044915.29 frames. ], batch size: 54, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:03:39,930 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 472950 2023-11-27 17:03:49,546 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3152986.6666666665, ans=0.125 2023-11-27 17:03:59,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3153053.3333333335, ans=0.0 2023-11-27 17:04:13,844 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 4050, loss[loss=0.06478, simple_loss=0.09507, pruned_loss=0.01134, audio_tagging_loss=0.0059, over 15226.00 frames. ], tot_loss[loss=0.06719, simple_loss=0.09049, pruned_loss=0.01283, audio_tagging_loss=0.00912, over 3047348.61 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:04:21,553 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 17:04:29,442 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3153253.3333333335, ans=0.125 2023-11-27 17:04:34,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3153253.3333333335, ans=0.125 2023-11-27 17:04:34,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3153253.3333333335, ans=0.0 2023-11-27 17:04:37,474 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 473000 2023-11-27 17:04:41,278 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3153320.0, ans=0.0 2023-11-27 17:04:47,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3153386.6666666665, ans=0.1 2023-11-27 17:04:53,034 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.948e+01 9.021e+01 9.575e+01 1.043e+02 1.402e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-27 17:04:53,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3153386.6666666665, ans=0.125 2023-11-27 17:05:12,225 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 4100, loss[loss=0.05625, simple_loss=0.07882, pruned_loss=0.009901, audio_tagging_loss=0.006939, over 16568.00 frames. ], tot_loss[loss=0.06719, simple_loss=0.09064, pruned_loss=0.0128, audio_tagging_loss=0.009065, over 3055423.54 frames. ], batch size: 64, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:05:20,865 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3153520.0, ans=0.0 2023-11-27 17:05:34,800 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 473050 2023-11-27 17:05:38,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3153653.3333333335, ans=0.05 2023-11-27 17:05:56,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3153720.0, ans=0.2 2023-11-27 17:05:57,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3153786.6666666665, ans=0.1 2023-11-27 17:06:05,158 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.94 vs. limit=15.0 2023-11-27 17:06:09,986 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 4150, loss[loss=0.06319, simple_loss=0.09077, pruned_loss=0.009053, audio_tagging_loss=0.008752, over 14803.00 frames. ], tot_loss[loss=0.06719, simple_loss=0.09068, pruned_loss=0.01284, audio_tagging_loss=0.009007, over 3050253.49 frames. ], batch size: 54, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:06:32,999 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 473100 2023-11-27 17:06:35,358 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3153986.6666666665, ans=0.0 2023-11-27 17:06:37,236 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.19 vs. limit=12.0 2023-11-27 17:06:40,433 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-27 17:06:43,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3153986.6666666665, ans=0.125 2023-11-27 17:06:45,555 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3154053.3333333335, ans=0.5 2023-11-27 17:06:47,857 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.44 vs. limit=15.0 2023-11-27 17:06:49,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3154053.3333333335, ans=0.125 2023-11-27 17:06:50,630 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.197e+01 8.636e+01 9.410e+01 1.013e+02 1.260e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-27 17:06:55,205 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 17:06:58,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3154120.0, ans=0.0 2023-11-27 17:07:07,797 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 4200, loss[loss=0.07663, simple_loss=0.1088, pruned_loss=0.01383, audio_tagging_loss=0.008409, over 14889.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.0893, pruned_loss=0.01238, audio_tagging_loss=0.008945, over 3055190.77 frames. ], batch size: 54, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:07:31,570 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 473150 2023-11-27 17:08:05,638 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 4250, loss[loss=0.07667, simple_loss=0.1158, pruned_loss=0.01413, audio_tagging_loss=0.004649, over 15713.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.09099, pruned_loss=0.01262, audio_tagging_loss=0.008744, over 3054957.68 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:08:06,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3154520.0, ans=0.1 2023-11-27 17:08:19,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3154586.6666666665, ans=0.2 2023-11-27 17:08:19,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3154586.6666666665, ans=0.125 2023-11-27 17:08:24,852 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.30 vs. limit=22.5 2023-11-27 17:08:25,026 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.41 vs. limit=12.0 2023-11-27 17:08:28,882 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 473200 2023-11-27 17:08:46,607 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.430e+01 8.728e+01 9.330e+01 9.912e+01 1.216e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-27 17:08:48,399 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.84 vs. limit=10.0 2023-11-27 17:09:04,284 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 4300, loss[loss=0.07134, simple_loss=0.1014, pruned_loss=0.01452, audio_tagging_loss=0.006136, over 15311.00 frames. ], tot_loss[loss=0.0666, simple_loss=0.0908, pruned_loss=0.01256, audio_tagging_loss=0.008638, over 3056488.42 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:09:08,103 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.28 vs. limit=15.0 2023-11-27 17:09:21,455 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 17:09:27,293 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 473250 2023-11-27 17:09:32,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3154986.6666666665, ans=0.1 2023-11-27 17:09:48,031 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.33 vs. limit=22.5 2023-11-27 17:09:54,401 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3155120.0, ans=0.125 2023-11-27 17:10:00,573 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 4350, loss[loss=0.04577, simple_loss=0.05519, pruned_loss=0.006561, audio_tagging_loss=0.01161, over 14966.00 frames. ], tot_loss[loss=0.06697, simple_loss=0.09143, pruned_loss=0.01267, audio_tagging_loss=0.008591, over 3055942.25 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:10:00,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3155186.6666666665, ans=0.07 2023-11-27 17:10:19,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3155253.3333333335, ans=0.2 2023-11-27 17:10:20,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3155253.3333333335, ans=0.125 2023-11-27 17:10:24,466 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 473300 2023-11-27 17:10:29,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3155320.0, ans=0.2 2023-11-27 17:10:30,127 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 17:10:33,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3155320.0, ans=0.0 2023-11-27 17:10:41,001 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.394e+01 8.765e+01 9.493e+01 1.043e+02 1.484e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-27 17:10:58,646 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 4400, loss[loss=0.05734, simple_loss=0.0789, pruned_loss=0.008231, audio_tagging_loss=0.009656, over 15600.00 frames. ], tot_loss[loss=0.06723, simple_loss=0.09157, pruned_loss=0.01282, audio_tagging_loss=0.00862, over 3057729.38 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:10:59,072 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.84 vs. limit=22.5 2023-11-27 17:11:21,902 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 473350 2023-11-27 17:11:41,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3155720.0, ans=0.125 2023-11-27 17:11:57,080 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 4450, loss[loss=0.07095, simple_loss=0.09977, pruned_loss=0.01389, audio_tagging_loss=0.007175, over 14764.00 frames. ], tot_loss[loss=0.06727, simple_loss=0.09165, pruned_loss=0.01289, audio_tagging_loss=0.008552, over 3056523.46 frames. ], batch size: 54, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:12:01,212 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.38 vs. limit=15.0 2023-11-27 17:12:09,485 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3155920.0, ans=0.1 2023-11-27 17:12:10,473 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3155920.0, ans=0.1 2023-11-27 17:12:12,649 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3155920.0, ans=0.125 2023-11-27 17:12:19,391 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 473400 2023-11-27 17:12:29,993 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3156053.3333333335, ans=0.015 2023-11-27 17:12:38,301 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.039e+01 8.835e+01 9.403e+01 1.018e+02 2.786e+02, threshold=1.881e+02, percent-clipped=1.0 2023-11-27 17:12:40,207 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3156053.3333333335, ans=0.125 2023-11-27 17:12:46,083 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.26 vs. limit=15.0 2023-11-27 17:12:54,381 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 4500, loss[loss=0.05528, simple_loss=0.07778, pruned_loss=0.008195, audio_tagging_loss=0.008191, over 14907.00 frames. ], tot_loss[loss=0.06725, simple_loss=0.09162, pruned_loss=0.01292, audio_tagging_loss=0.008523, over 3057525.90 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:13:17,185 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 473450 2023-11-27 17:13:17,259 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3156320.0, ans=0.0 2023-11-27 17:13:36,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3156386.6666666665, ans=0.07 2023-11-27 17:13:52,325 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 4550, loss[loss=0.07092, simple_loss=0.09866, pruned_loss=0.01535, audio_tagging_loss=0.006241, over 15200.00 frames. ], tot_loss[loss=0.06717, simple_loss=0.09146, pruned_loss=0.01287, audio_tagging_loss=0.008563, over 3055195.63 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:14:03,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3156586.6666666665, ans=0.125 2023-11-27 17:14:08,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3156586.6666666665, ans=0.0 2023-11-27 17:14:13,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3156586.6666666665, ans=0.125 2023-11-27 17:14:15,588 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 473500 2023-11-27 17:14:22,960 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.97 vs. limit=8.0 2023-11-27 17:14:33,785 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.579e+01 8.569e+01 9.256e+01 9.932e+01 4.356e+02, threshold=1.851e+02, percent-clipped=1.0 2023-11-27 17:14:39,262 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 17:14:40,576 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3156786.6666666665, ans=0.0 2023-11-27 17:14:41,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3156786.6666666665, ans=0.1 2023-11-27 17:14:49,599 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 4600, loss[loss=0.0461, simple_loss=0.05636, pruned_loss=0.004921, audio_tagging_loss=0.013, over 14898.00 frames. ], tot_loss[loss=0.06701, simple_loss=0.09114, pruned_loss=0.01282, audio_tagging_loss=0.008624, over 3057977.68 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:15:00,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3156920.0, ans=0.0 2023-11-27 17:15:01,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3156920.0, ans=0.0 2023-11-27 17:15:03,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3156920.0, ans=0.125 2023-11-27 17:15:12,821 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 473550 2023-11-27 17:15:22,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3156986.6666666665, ans=0.125 2023-11-27 17:15:22,260 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3156986.6666666665, ans=0.0 2023-11-27 17:15:39,821 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.17 vs. limit=15.0 2023-11-27 17:15:47,487 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 4650, loss[loss=0.06531, simple_loss=0.07869, pruned_loss=0.01575, audio_tagging_loss=0.01022, over 14092.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.09004, pruned_loss=0.01264, audio_tagging_loss=0.0087, over 3052653.31 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:15:49,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3157186.6666666665, ans=0.1 2023-11-27 17:15:50,893 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3157186.6666666665, ans=0.2 2023-11-27 17:15:54,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3157186.6666666665, ans=0.1 2023-11-27 17:16:10,316 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 473600 2023-11-27 17:16:14,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3157320.0, ans=0.0 2023-11-27 17:16:22,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3157386.6666666665, ans=0.125 2023-11-27 17:16:29,576 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.658e+01 8.758e+01 9.328e+01 1.030e+02 1.229e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-27 17:16:29,915 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3157386.6666666665, ans=0.125 2023-11-27 17:16:30,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3157386.6666666665, ans=0.125 2023-11-27 17:16:44,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3157520.0, ans=0.125 2023-11-27 17:16:45,827 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 4700, loss[loss=0.07814, simple_loss=0.1109, pruned_loss=0.01441, audio_tagging_loss=0.008284, over 15408.00 frames. ], tot_loss[loss=0.06692, simple_loss=0.09071, pruned_loss=0.01278, audio_tagging_loss=0.008782, over 3054129.19 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:16:46,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3157520.0, ans=0.0 2023-11-27 17:16:48,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3157520.0, ans=0.0 2023-11-27 17:16:52,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3157520.0, ans=0.2 2023-11-27 17:17:07,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3157653.3333333335, ans=0.125 2023-11-27 17:17:08,436 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 473650 2023-11-27 17:17:18,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3157720.0, ans=0.0 2023-11-27 17:17:41,795 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.59 vs. limit=6.0 2023-11-27 17:17:43,358 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 4750, loss[loss=0.05088, simple_loss=0.06004, pruned_loss=0.0087, audio_tagging_loss=0.01216, over 15395.00 frames. ], tot_loss[loss=0.06678, simple_loss=0.09027, pruned_loss=0.01273, audio_tagging_loss=0.008909, over 3048302.72 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:17:52,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3157853.3333333335, ans=0.125 2023-11-27 17:17:53,601 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3157920.0, ans=0.125 2023-11-27 17:18:02,243 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.25 vs. limit=22.5 2023-11-27 17:18:03,280 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.66 vs. limit=6.0 2023-11-27 17:18:06,435 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 473700 2023-11-27 17:18:24,343 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.634e+01 8.859e+01 9.575e+01 1.045e+02 1.210e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-27 17:18:24,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3158053.3333333335, ans=0.125 2023-11-27 17:18:29,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3158120.0, ans=0.1 2023-11-27 17:18:30,593 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3158120.0, ans=0.125 2023-11-27 17:18:40,205 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 4800, loss[loss=0.07736, simple_loss=0.1177, pruned_loss=0.01091, audio_tagging_loss=0.007585, over 16559.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.08969, pruned_loss=0.01245, audio_tagging_loss=0.008972, over 3046333.92 frames. ], batch size: 60, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:18:55,746 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3158253.3333333335, ans=0.5 2023-11-27 17:18:58,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3158253.3333333335, ans=0.2 2023-11-27 17:19:03,904 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 473750 2023-11-27 17:19:06,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3158320.0, ans=0.2 2023-11-27 17:19:07,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3158320.0, ans=0.0 2023-11-27 17:19:38,168 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 4850, loss[loss=0.04247, simple_loss=0.05469, pruned_loss=0.004857, audio_tagging_loss=0.01026, over 14767.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.08912, pruned_loss=0.01237, audio_tagging_loss=0.009117, over 3045656.62 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:19:50,958 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3158586.6666666665, ans=0.125 2023-11-27 17:19:55,145 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3158586.6666666665, ans=0.125 2023-11-27 17:19:59,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3158586.6666666665, ans=0.0 2023-11-27 17:20:01,618 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 473800 2023-11-27 17:20:10,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3158653.3333333335, ans=0.0 2023-11-27 17:20:20,709 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.605e+01 8.680e+01 9.364e+01 9.927e+01 1.620e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-27 17:20:36,677 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 4900, loss[loss=0.0611, simple_loss=0.07914, pruned_loss=0.0123, audio_tagging_loss=0.009228, over 15024.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.08903, pruned_loss=0.01221, audio_tagging_loss=0.009081, over 3044391.31 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:21:00,031 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 473850 2023-11-27 17:21:06,383 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.29 vs. limit=22.5 2023-11-27 17:21:11,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3159053.3333333335, ans=0.0 2023-11-27 17:21:26,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3159120.0, ans=0.125 2023-11-27 17:21:34,309 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 4950, loss[loss=0.08298, simple_loss=0.1045, pruned_loss=0.01693, audio_tagging_loss=0.01381, over 15483.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.08916, pruned_loss=0.01238, audio_tagging_loss=0.008977, over 3042340.67 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:21:57,398 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 473900 2023-11-27 17:22:00,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3159320.0, ans=0.125 2023-11-27 17:22:09,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3159386.6666666665, ans=0.0 2023-11-27 17:22:16,614 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.075e+01 8.677e+01 9.528e+01 1.024e+02 1.553e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-27 17:22:31,927 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 5000, loss[loss=0.06967, simple_loss=0.09516, pruned_loss=0.01532, audio_tagging_loss=0.006766, over 16047.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.08976, pruned_loss=0.01249, audio_tagging_loss=0.008839, over 3047043.51 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:22:40,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3159520.0, ans=0.2 2023-11-27 17:22:45,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3159586.6666666665, ans=0.125 2023-11-27 17:22:55,071 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 473950 2023-11-27 17:23:06,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3159720.0, ans=0.0 2023-11-27 17:23:18,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3159786.6666666665, ans=0.2 2023-11-27 17:23:28,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3159853.3333333335, ans=0.0 2023-11-27 17:23:29,488 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 5050, loss[loss=0.03956, simple_loss=0.04781, pruned_loss=0.006165, audio_tagging_loss=0.009491, over 14390.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08865, pruned_loss=0.01218, audio_tagging_loss=0.00883, over 3044846.87 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:23:37,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3159853.3333333335, ans=0.2 2023-11-27 17:23:40,467 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3159920.0, ans=0.1 2023-11-27 17:23:52,209 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 474000 2023-11-27 17:23:56,846 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.29 vs. limit=15.0 2023-11-27 17:24:00,581 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3159986.6666666665, ans=0.0 2023-11-27 17:24:12,650 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.792e+01 8.599e+01 9.260e+01 9.891e+01 1.238e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-27 17:24:27,717 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 5100, loss[loss=0.07424, simple_loss=0.1072, pruned_loss=0.01148, audio_tagging_loss=0.009163, over 16209.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08833, pruned_loss=0.01224, audio_tagging_loss=0.008841, over 3042558.50 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:24:38,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3160253.3333333335, ans=0.1 2023-11-27 17:24:46,899 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.54 vs. limit=15.0 2023-11-27 17:24:51,286 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 474050 2023-11-27 17:25:16,500 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.47 vs. limit=10.0 2023-11-27 17:25:24,993 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 5150, loss[loss=0.05729, simple_loss=0.08379, pruned_loss=0.007748, audio_tagging_loss=0.00765, over 15051.00 frames. ], tot_loss[loss=0.06489, simple_loss=0.08793, pruned_loss=0.01209, audio_tagging_loss=0.008841, over 3046087.51 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:25:26,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3160520.0, ans=0.2 2023-11-27 17:25:48,658 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 474100 2023-11-27 17:26:07,193 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.340e+01 8.651e+01 9.333e+01 9.963e+01 1.109e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-27 17:26:12,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3160786.6666666665, ans=0.0 2023-11-27 17:26:22,473 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 5200, loss[loss=0.06827, simple_loss=0.08494, pruned_loss=0.01608, audio_tagging_loss=0.009726, over 14478.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.0887, pruned_loss=0.01226, audio_tagging_loss=0.00882, over 3039642.80 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:26:25,335 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.68 vs. limit=12.0 2023-11-27 17:26:28,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3160853.3333333335, ans=0.2 2023-11-27 17:26:37,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3160920.0, ans=0.04949747468305833 2023-11-27 17:26:45,130 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 474150 2023-11-27 17:26:49,026 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.91 vs. limit=15.0 2023-11-27 17:26:56,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3161053.3333333335, ans=0.125 2023-11-27 17:27:02,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3161053.3333333335, ans=0.125 2023-11-27 17:27:18,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3161120.0, ans=0.125 2023-11-27 17:27:20,055 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 5250, loss[loss=0.08653, simple_loss=0.1099, pruned_loss=0.02236, audio_tagging_loss=0.009219, over 14693.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.08945, pruned_loss=0.01252, audio_tagging_loss=0.008689, over 3046389.29 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:27:40,831 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.35 vs. limit=12.0 2023-11-27 17:27:42,552 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 474200 2023-11-27 17:27:44,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3161320.0, ans=0.0 2023-11-27 17:27:44,237 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3161320.0, ans=0.0 2023-11-27 17:27:49,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3161320.0, ans=0.2 2023-11-27 17:27:51,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3161320.0, ans=0.125 2023-11-27 17:27:55,283 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.12 vs. limit=22.5 2023-11-27 17:27:58,988 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3161386.6666666665, ans=0.125 2023-11-27 17:28:03,023 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.456e+01 8.718e+01 9.401e+01 1.041e+02 1.435e+02, threshold=1.880e+02, percent-clipped=0.0 2023-11-27 17:28:08,958 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.30 vs. limit=12.0 2023-11-27 17:28:17,163 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 5300, loss[loss=0.07451, simple_loss=0.1043, pruned_loss=0.0138, audio_tagging_loss=0.008578, over 14841.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.08971, pruned_loss=0.01262, audio_tagging_loss=0.008726, over 3044574.53 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:28:23,080 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3161520.0, ans=0.125 2023-11-27 17:28:33,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3161586.6666666665, ans=0.07 2023-11-27 17:28:35,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3161586.6666666665, ans=0.1 2023-11-27 17:28:40,941 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 474250 2023-11-27 17:28:44,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3161653.3333333335, ans=0.125 2023-11-27 17:28:49,472 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.27 vs. limit=15.0 2023-11-27 17:29:01,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3161720.0, ans=0.125 2023-11-27 17:29:10,461 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.00 vs. limit=22.5 2023-11-27 17:29:14,711 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 5350, loss[loss=0.06543, simple_loss=0.08218, pruned_loss=0.0157, audio_tagging_loss=0.008635, over 15003.00 frames. ], tot_loss[loss=0.06645, simple_loss=0.09016, pruned_loss=0.01272, audio_tagging_loss=0.008642, over 3044011.29 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:29:30,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3161920.0, ans=0.1 2023-11-27 17:29:38,005 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 474300 2023-11-27 17:29:55,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3162053.3333333335, ans=0.0 2023-11-27 17:29:57,248 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.46 vs. limit=15.0 2023-11-27 17:29:57,549 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.768e+01 8.549e+01 9.139e+01 9.970e+01 1.797e+02, threshold=1.828e+02, percent-clipped=0.0 2023-11-27 17:30:01,743 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.12 vs. limit=5.0 2023-11-27 17:30:13,036 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 5400, loss[loss=0.08463, simple_loss=0.1229, pruned_loss=0.01687, audio_tagging_loss=0.006297, over 15828.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.09003, pruned_loss=0.01261, audio_tagging_loss=0.00876, over 3051382.52 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:30:22,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3162253.3333333335, ans=0.125 2023-11-27 17:30:35,265 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 474350 2023-11-27 17:30:36,422 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3162320.0, ans=0.1 2023-11-27 17:31:09,435 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 5450, loss[loss=0.06297, simple_loss=0.0847, pruned_loss=0.01281, audio_tagging_loss=0.007805, over 15650.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.09021, pruned_loss=0.01265, audio_tagging_loss=0.008738, over 3051240.73 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:31:33,090 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 474400 2023-11-27 17:31:46,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3162720.0, ans=0.125 2023-11-27 17:31:49,494 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.30 vs. limit=12.0 2023-11-27 17:31:52,509 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3162720.0, ans=0.125 2023-11-27 17:31:53,362 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.021e+01 8.734e+01 9.322e+01 1.014e+02 1.420e+02, threshold=1.864e+02, percent-clipped=0.0 2023-11-27 17:32:07,532 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 5500, loss[loss=0.05636, simple_loss=0.07465, pruned_loss=0.009881, audio_tagging_loss=0.009156, over 15993.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.09054, pruned_loss=0.0127, audio_tagging_loss=0.008692, over 3050289.05 frames. ], batch size: 62, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:32:27,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3162920.0, ans=0.125 2023-11-27 17:32:30,739 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 474450 2023-11-27 17:32:44,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3163053.3333333335, ans=0.125 2023-11-27 17:32:50,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3163053.3333333335, ans=0.95 2023-11-27 17:33:05,370 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 5550, loss[loss=0.06586, simple_loss=0.08481, pruned_loss=0.01353, audio_tagging_loss=0.009926, over 14694.00 frames. ], tot_loss[loss=0.06683, simple_loss=0.09062, pruned_loss=0.01276, audio_tagging_loss=0.008757, over 3044840.14 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:33:13,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3163186.6666666665, ans=0.0 2023-11-27 17:33:22,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3163253.3333333335, ans=0.125 2023-11-27 17:33:27,314 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.46 vs. limit=22.5 2023-11-27 17:33:27,873 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 474500 2023-11-27 17:33:35,919 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3163320.0, ans=0.1 2023-11-27 17:33:40,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3163386.6666666665, ans=0.125 2023-11-27 17:33:49,148 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.390e+01 8.657e+01 9.312e+01 9.840e+01 1.170e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-27 17:34:02,489 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 5600, loss[loss=0.05187, simple_loss=0.06194, pruned_loss=0.01031, audio_tagging_loss=0.01059, over 14974.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.09021, pruned_loss=0.01265, audio_tagging_loss=0.008889, over 3049644.73 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:34:17,553 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 17:34:25,498 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 474550 2023-11-27 17:34:43,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3163720.0, ans=0.05 2023-11-27 17:34:47,602 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 17:34:50,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3163786.6666666665, ans=0.0 2023-11-27 17:34:59,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3163853.3333333335, ans=0.0 2023-11-27 17:34:59,996 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 5650, loss[loss=0.05989, simple_loss=0.0788, pruned_loss=0.01178, audio_tagging_loss=0.008708, over 15276.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.08989, pruned_loss=0.01265, audio_tagging_loss=0.008925, over 3052524.32 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:35:06,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3163853.3333333335, ans=0.0 2023-11-27 17:35:14,916 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.23 vs. limit=15.0 2023-11-27 17:35:23,698 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 474600 2023-11-27 17:35:26,483 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3163986.6666666665, ans=0.125 2023-11-27 17:35:28,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3163986.6666666665, ans=0.015 2023-11-27 17:35:28,469 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3163986.6666666665, ans=0.2 2023-11-27 17:35:45,035 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.769e+01 8.679e+01 9.217e+01 1.003e+02 1.541e+02, threshold=1.843e+02, percent-clipped=0.0 2023-11-27 17:35:58,325 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 5700, loss[loss=0.0461, simple_loss=0.06258, pruned_loss=0.006184, audio_tagging_loss=0.008622, over 14568.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.08918, pruned_loss=0.01252, audio_tagging_loss=0.008942, over 3049148.36 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:36:20,649 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 474650 2023-11-27 17:36:20,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3164320.0, ans=0.1 2023-11-27 17:36:26,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3164320.0, ans=0.125 2023-11-27 17:36:36,913 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3164386.6666666665, ans=0.125 2023-11-27 17:36:39,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3164386.6666666665, ans=0.0 2023-11-27 17:36:45,421 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.52 vs. limit=15.0 2023-11-27 17:36:46,182 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 17:36:47,614 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.07 vs. limit=15.0 2023-11-27 17:36:55,055 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 5750, loss[loss=0.06743, simple_loss=0.08887, pruned_loss=0.01396, audio_tagging_loss=0.009034, over 14722.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.08902, pruned_loss=0.01251, audio_tagging_loss=0.008829, over 3053303.65 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:37:12,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3164586.6666666665, ans=0.125 2023-11-27 17:37:18,037 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 474700 2023-11-27 17:37:29,775 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=3164720.0, ans=0.95 2023-11-27 17:37:39,970 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.234e+01 8.634e+01 9.303e+01 1.008e+02 1.326e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-27 17:37:52,574 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 5800, loss[loss=0.07037, simple_loss=0.09251, pruned_loss=0.01576, audio_tagging_loss=0.008353, over 15282.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.08919, pruned_loss=0.01269, audio_tagging_loss=0.008683, over 3053582.02 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:37:57,949 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.61 vs. limit=22.5 2023-11-27 17:37:58,065 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.75 vs. limit=10.0 2023-11-27 17:38:01,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3164853.3333333335, ans=0.125 2023-11-27 17:38:03,189 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3164920.0, ans=0.125 2023-11-27 17:38:15,598 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 474750 2023-11-27 17:38:23,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3164986.6666666665, ans=0.0 2023-11-27 17:38:27,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3165053.3333333335, ans=0.125 2023-11-27 17:38:41,948 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.28 vs. limit=15.0 2023-11-27 17:38:43,554 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3165120.0, ans=0.0 2023-11-27 17:38:49,885 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 5850, loss[loss=0.1006, simple_loss=0.144, pruned_loss=0.02356, audio_tagging_loss=0.005017, over 15331.00 frames. ], tot_loss[loss=0.06648, simple_loss=0.08983, pruned_loss=0.01285, audio_tagging_loss=0.008709, over 3048103.46 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:38:56,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3165186.6666666665, ans=0.125 2023-11-27 17:39:01,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3165253.3333333335, ans=0.0 2023-11-27 17:39:06,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3165253.3333333335, ans=0.125 2023-11-27 17:39:08,211 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.37 vs. limit=22.5 2023-11-27 17:39:13,017 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 474800 2023-11-27 17:39:21,610 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3165320.0, ans=0.125 2023-11-27 17:39:23,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3165386.6666666665, ans=0.0 2023-11-27 17:39:26,743 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 17:39:35,139 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.752e+01 8.698e+01 9.361e+01 9.946e+01 1.172e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-27 17:39:42,538 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3165453.3333333335, ans=0.0 2023-11-27 17:39:48,451 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 5900, loss[loss=0.07102, simple_loss=0.09167, pruned_loss=0.01403, audio_tagging_loss=0.01115, over 14793.00 frames. ], tot_loss[loss=0.06701, simple_loss=0.09053, pruned_loss=0.01301, audio_tagging_loss=0.008743, over 3047286.30 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:39:50,174 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.37 vs. limit=15.0 2023-11-27 17:40:11,585 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 474850 2023-11-27 17:40:19,592 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3165653.3333333335, ans=10.0 2023-11-27 17:40:25,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_na.min_abs, batch_count=3165720.0, ans=0.02 2023-11-27 17:40:31,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3165720.0, ans=0.125 2023-11-27 17:40:38,719 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.75 vs. limit=15.0 2023-11-27 17:40:42,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3165786.6666666665, ans=0.2 2023-11-27 17:40:43,225 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.57 vs. limit=15.0 2023-11-27 17:40:46,223 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 5950, loss[loss=0.07647, simple_loss=0.1001, pruned_loss=0.01651, audio_tagging_loss=0.009891, over 16637.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.08963, pruned_loss=0.01275, audio_tagging_loss=0.00873, over 3045910.33 frames. ], batch size: 62, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:40:57,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3165920.0, ans=0.09899494936611666 2023-11-27 17:41:07,363 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.90 vs. limit=22.5 2023-11-27 17:41:09,202 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 474900 2023-11-27 17:41:09,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3165986.6666666665, ans=0.125 2023-11-27 17:41:26,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3166053.3333333335, ans=0.0 2023-11-27 17:41:30,962 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.130e+01 8.669e+01 9.306e+01 1.020e+02 1.374e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-27 17:41:43,437 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 6000, loss[loss=0.06764, simple_loss=0.09509, pruned_loss=0.01132, audio_tagging_loss=0.008775, over 16716.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.0893, pruned_loss=0.01266, audio_tagging_loss=0.008729, over 3045687.97 frames. ], batch size: 60, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:41:43,437 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-27 17:41:59,248 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.1370, 3.6445, 4.0698, 3.7312], device='cuda:1') 2023-11-27 17:42:06,711 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.4772, 3.4156, 3.6945, 3.6280], device='cuda:1') 2023-11-27 17:42:18,059 INFO [train_asr.py:1267] (1/4) Epoch 40, validation: loss=0.05751, simple_loss=0.05064, pruned_loss=0.005151, audio_tagging_loss=0.02703, over 4681554.00 frames. 2023-11-27 17:42:18,059 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-27 17:42:18,201 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3166186.6666666665, ans=0.2 2023-11-27 17:42:21,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3166186.6666666665, ans=0.125 2023-11-27 17:42:30,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3166253.3333333335, ans=0.1 2023-11-27 17:42:34,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3166253.3333333335, ans=10.0 2023-11-27 17:42:40,811 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 474950 2023-11-27 17:42:40,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_na.min_abs, batch_count=3166320.0, ans=0.02 2023-11-27 17:43:01,329 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.06 vs. limit=22.5 2023-11-27 17:43:02,568 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 17:43:02,832 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3166453.3333333335, ans=0.1 2023-11-27 17:43:13,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3166453.3333333335, ans=0.0 2023-11-27 17:43:14,951 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 6050, loss[loss=0.07456, simple_loss=0.107, pruned_loss=0.01251, audio_tagging_loss=0.008529, over 17284.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.09047, pruned_loss=0.01275, audio_tagging_loss=0.008607, over 3044419.28 frames. ], batch size: 63, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:43:24,901 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.65 vs. limit=22.5 2023-11-27 17:43:38,183 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 475000 2023-11-27 17:43:51,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3166720.0, ans=0.125 2023-11-27 17:44:01,701 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.168e+01 8.500e+01 9.097e+01 9.950e+01 1.272e+02, threshold=1.819e+02, percent-clipped=0.0 2023-11-27 17:44:01,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3166786.6666666665, ans=0.0 2023-11-27 17:44:03,528 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.51 vs. limit=10.0 2023-11-27 17:44:11,151 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.12 vs. limit=10.0 2023-11-27 17:44:11,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3166853.3333333335, ans=0.0 2023-11-27 17:44:12,695 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 6100, loss[loss=0.04731, simple_loss=0.06317, pruned_loss=0.006281, audio_tagging_loss=0.009448, over 14944.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.09001, pruned_loss=0.01254, audio_tagging_loss=0.008651, over 3041914.05 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:44:22,165 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3166853.3333333335, ans=0.125 2023-11-27 17:44:25,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3166920.0, ans=0.125 2023-11-27 17:44:35,841 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 475050 2023-11-27 17:44:45,937 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3166986.6666666665, ans=0.0 2023-11-27 17:44:52,572 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 17:45:10,568 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 6150, loss[loss=0.06137, simple_loss=0.08711, pruned_loss=0.01108, audio_tagging_loss=0.006731, over 16179.00 frames. ], tot_loss[loss=0.06691, simple_loss=0.09105, pruned_loss=0.01276, audio_tagging_loss=0.008628, over 3037278.64 frames. ], batch size: 63, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:45:22,994 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3167253.3333333335, ans=0.1 2023-11-27 17:45:31,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3167253.3333333335, ans=0.125 2023-11-27 17:45:33,750 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.64 vs. limit=10.0 2023-11-27 17:45:34,355 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 475100 2023-11-27 17:45:36,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3167320.0, ans=0.125 2023-11-27 17:45:39,945 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3167320.0, ans=0.2 2023-11-27 17:45:56,718 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.195e+01 8.777e+01 9.490e+01 1.001e+02 1.284e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-27 17:46:05,775 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3167453.3333333335, ans=0.2 2023-11-27 17:46:08,777 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 6200, loss[loss=0.06943, simple_loss=0.1023, pruned_loss=0.01114, audio_tagging_loss=0.00716, over 15312.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.09095, pruned_loss=0.01255, audio_tagging_loss=0.008662, over 3040327.49 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:46:28,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3167586.6666666665, ans=0.0 2023-11-27 17:46:29,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3167586.6666666665, ans=0.125 2023-11-27 17:46:31,931 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 475150 2023-11-27 17:46:50,250 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3167720.0, ans=0.125 2023-11-27 17:47:03,068 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.23 vs. limit=12.0 2023-11-27 17:47:05,735 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 6250, loss[loss=0.06779, simple_loss=0.0985, pruned_loss=0.01035, audio_tagging_loss=0.008192, over 14723.00 frames. ], tot_loss[loss=0.06664, simple_loss=0.09071, pruned_loss=0.01252, audio_tagging_loss=0.008764, over 3041638.78 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:47:12,206 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-27 17:47:20,122 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.42 vs. limit=15.0 2023-11-27 17:47:28,421 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 475200 2023-11-27 17:47:28,972 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.05 vs. limit=15.0 2023-11-27 17:47:52,288 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.124e+01 8.766e+01 9.317e+01 1.001e+02 1.294e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-27 17:48:03,986 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 6300, loss[loss=0.09316, simple_loss=0.1454, pruned_loss=0.01712, audio_tagging_loss=0.003352, over 15847.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.09025, pruned_loss=0.01246, audio_tagging_loss=0.008785, over 3043817.23 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:48:09,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3168186.6666666665, ans=0.125 2023-11-27 17:48:15,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3168253.3333333335, ans=0.2 2023-11-27 17:48:27,755 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 475250 2023-11-27 17:48:28,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3168320.0, ans=0.125 2023-11-27 17:48:42,243 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.41 vs. limit=15.0 2023-11-27 17:48:43,185 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.46 vs. limit=10.0 2023-11-27 17:48:46,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3168386.6666666665, ans=0.125 2023-11-27 17:48:53,672 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3168453.3333333335, ans=0.1 2023-11-27 17:48:57,052 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3168453.3333333335, ans=0.125 2023-11-27 17:49:01,771 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 6350, loss[loss=0.0766, simple_loss=0.1033, pruned_loss=0.0179, audio_tagging_loss=0.007063, over 14831.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.08912, pruned_loss=0.01231, audio_tagging_loss=0.008941, over 3039646.29 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:49:05,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3168520.0, ans=0.2 2023-11-27 17:49:16,212 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3168586.6666666665, ans=0.0 2023-11-27 17:49:25,279 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 475300 2023-11-27 17:49:25,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3168653.3333333335, ans=0.0 2023-11-27 17:49:36,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3168720.0, ans=0.0 2023-11-27 17:49:44,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3168720.0, ans=0.04949747468305833 2023-11-27 17:49:47,729 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.690e+01 8.715e+01 9.486e+01 1.017e+02 1.352e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-27 17:50:00,009 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 6400, loss[loss=0.07764, simple_loss=0.1078, pruned_loss=0.01344, audio_tagging_loss=0.01029, over 15857.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.08967, pruned_loss=0.0124, audio_tagging_loss=0.009079, over 3036320.00 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:50:04,884 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.42 vs. limit=15.0 2023-11-27 17:50:20,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3168920.0, ans=0.04949747468305833 2023-11-27 17:50:22,414 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 475350 2023-11-27 17:50:30,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3168986.6666666665, ans=0.07 2023-11-27 17:50:36,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3169053.3333333335, ans=0.2 2023-11-27 17:50:46,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3169120.0, ans=0.1 2023-11-27 17:50:56,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3169186.6666666665, ans=0.2 2023-11-27 17:50:57,180 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 6450, loss[loss=0.05952, simple_loss=0.07616, pruned_loss=0.01044, audio_tagging_loss=0.011, over 15429.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.08924, pruned_loss=0.01243, audio_tagging_loss=0.009128, over 3039732.61 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:51:08,706 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.90 vs. limit=15.0 2023-11-27 17:51:15,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten.whitening_limit, batch_count=3169253.3333333335, ans=15.0 2023-11-27 17:51:20,182 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 475400 2023-11-27 17:51:23,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3169320.0, ans=0.1 2023-11-27 17:51:28,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3169320.0, ans=0.125 2023-11-27 17:51:33,296 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 17:51:44,578 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.677e+01 8.833e+01 9.242e+01 1.006e+02 1.317e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-27 17:51:51,473 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3169453.3333333335, ans=0.125 2023-11-27 17:51:54,638 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 6500, loss[loss=0.08205, simple_loss=0.1124, pruned_loss=0.01802, audio_tagging_loss=0.007828, over 15804.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.08927, pruned_loss=0.01252, audio_tagging_loss=0.009023, over 3043251.84 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:52:00,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3169520.0, ans=0.2 2023-11-27 17:52:15,593 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.16 vs. limit=15.0 2023-11-27 17:52:15,707 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.56 vs. limit=15.0 2023-11-27 17:52:17,381 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3169653.3333333335, ans=0.125 2023-11-27 17:52:18,434 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 475450 2023-11-27 17:52:27,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3169653.3333333335, ans=10.0 2023-11-27 17:52:39,242 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.00 vs. limit=15.0 2023-11-27 17:52:42,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3169786.6666666665, ans=0.0 2023-11-27 17:52:53,675 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 6550, loss[loss=0.08546, simple_loss=0.1158, pruned_loss=0.01859, audio_tagging_loss=0.008991, over 16019.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.08935, pruned_loss=0.01252, audio_tagging_loss=0.008954, over 3044870.80 frames. ], batch size: 60, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:52:53,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3169853.3333333335, ans=0.125 2023-11-27 17:53:16,499 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 475500 2023-11-27 17:53:40,826 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.492e+01 8.596e+01 9.247e+01 9.962e+01 1.603e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-27 17:53:42,596 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.33 vs. limit=22.5 2023-11-27 17:53:48,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3170120.0, ans=0.0 2023-11-27 17:53:51,297 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 6600, loss[loss=0.072, simple_loss=0.1011, pruned_loss=0.01337, audio_tagging_loss=0.008068, over 15771.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.08857, pruned_loss=0.01222, audio_tagging_loss=0.008902, over 3046905.59 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:53:53,051 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.62 vs. limit=15.0 2023-11-27 17:54:13,815 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 475550 2023-11-27 17:54:29,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3170386.6666666665, ans=0.125 2023-11-27 17:54:48,467 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 6650, loss[loss=0.05202, simple_loss=0.07403, pruned_loss=0.00746, audio_tagging_loss=0.00754, over 14635.00 frames. ], tot_loss[loss=0.06528, simple_loss=0.08844, pruned_loss=0.01224, audio_tagging_loss=0.008823, over 3038772.31 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:54:56,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3170520.0, ans=0.2 2023-11-27 17:55:03,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3170586.6666666665, ans=0.2 2023-11-27 17:55:12,011 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 475600 2023-11-27 17:55:14,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3170653.3333333335, ans=0.2 2023-11-27 17:55:21,521 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3170653.3333333335, ans=0.125 2023-11-27 17:55:36,102 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.968e+01 8.804e+01 9.430e+01 1.026e+02 1.343e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-27 17:55:42,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3170786.6666666665, ans=0.0 2023-11-27 17:55:45,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3170853.3333333335, ans=0.125 2023-11-27 17:55:46,575 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 6700, loss[loss=0.05703, simple_loss=0.07894, pruned_loss=0.009512, audio_tagging_loss=0.00805, over 16871.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08888, pruned_loss=0.01233, audio_tagging_loss=0.008699, over 3041762.59 frames. ], batch size: 62, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:55:49,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3170853.3333333335, ans=0.04949747468305833 2023-11-27 17:55:52,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3170853.3333333335, ans=0.0 2023-11-27 17:56:09,811 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 475650 2023-11-27 17:56:16,709 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3170986.6666666665, ans=0.1 2023-11-27 17:56:29,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3171053.3333333335, ans=0.2 2023-11-27 17:56:44,912 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 6750, loss[loss=0.07485, simple_loss=0.1007, pruned_loss=0.0148, audio_tagging_loss=0.009689, over 14344.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08853, pruned_loss=0.01232, audio_tagging_loss=0.008781, over 3040693.92 frames. ], batch size: 53, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:56:50,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3171186.6666666665, ans=0.125 2023-11-27 17:56:51,084 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.85 vs. limit=6.0 2023-11-27 17:56:54,084 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3171186.6666666665, ans=0.125 2023-11-27 17:56:57,841 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.94 vs. limit=15.0 2023-11-27 17:57:01,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3171253.3333333335, ans=0.1 2023-11-27 17:57:06,621 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 17:57:07,563 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 475700 2023-11-27 17:57:32,212 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.698e+01 8.726e+01 9.253e+01 9.869e+01 1.204e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-27 17:57:32,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3171453.3333333335, ans=0.125 2023-11-27 17:57:40,158 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=3171453.3333333335, ans=0.5 2023-11-27 17:57:42,253 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 6800, loss[loss=0.05344, simple_loss=0.07101, pruned_loss=0.009456, audio_tagging_loss=0.00848, over 14637.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.08992, pruned_loss=0.01252, audio_tagging_loss=0.008675, over 3040159.74 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:57:43,595 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3171520.0, ans=0.125 2023-11-27 17:57:44,719 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3171520.0, ans=0.09899494936611666 2023-11-27 17:57:48,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3171520.0, ans=0.0 2023-11-27 17:58:05,089 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 475750 2023-11-27 17:58:07,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3171653.3333333335, ans=0.125 2023-11-27 17:58:22,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3171720.0, ans=0.125 2023-11-27 17:58:32,351 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3171786.6666666665, ans=0.1 2023-11-27 17:58:40,109 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 6850, loss[loss=0.05873, simple_loss=0.08588, pruned_loss=0.008028, audio_tagging_loss=0.007765, over 15692.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.08964, pruned_loss=0.01244, audio_tagging_loss=0.008659, over 3042062.12 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:58:43,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3171853.3333333335, ans=0.125 2023-11-27 17:58:43,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3171853.3333333335, ans=0.2 2023-11-27 17:58:45,121 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.92 vs. limit=15.0 2023-11-27 17:58:55,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3171920.0, ans=0.125 2023-11-27 17:58:56,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3171920.0, ans=0.125 2023-11-27 17:59:03,413 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 475800 2023-11-27 17:59:08,475 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.78 vs. limit=22.5 2023-11-27 17:59:13,814 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3172053.3333333335, ans=0.125 2023-11-27 17:59:20,174 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.51 vs. limit=15.0 2023-11-27 17:59:28,686 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.463e+01 8.901e+01 9.541e+01 1.005e+02 1.351e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-27 17:59:38,191 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 6900, loss[loss=0.06416, simple_loss=0.09802, pruned_loss=0.007829, audio_tagging_loss=0.007323, over 15883.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.0903, pruned_loss=0.01246, audio_tagging_loss=0.008572, over 3043022.38 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:59:40,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3172186.6666666665, ans=0.1 2023-11-27 17:59:47,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3172186.6666666665, ans=0.2 2023-11-27 17:59:51,571 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.82 vs. limit=15.0 2023-11-27 18:00:01,174 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 475850 2023-11-27 18:00:19,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3172386.6666666665, ans=0.125 2023-11-27 18:00:22,523 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3172386.6666666665, ans=0.125 2023-11-27 18:00:25,633 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 18:00:34,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3172453.3333333335, ans=0.0 2023-11-27 18:00:36,156 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 6950, loss[loss=0.06578, simple_loss=0.09124, pruned_loss=0.01319, audio_tagging_loss=0.006968, over 15407.00 frames. ], tot_loss[loss=0.066, simple_loss=0.09007, pruned_loss=0.01233, audio_tagging_loss=0.008643, over 3045313.54 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:00:38,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3172520.0, ans=0.0 2023-11-27 18:00:52,815 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3172586.6666666665, ans=0.125 2023-11-27 18:00:58,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3172653.3333333335, ans=0.0 2023-11-27 18:00:59,188 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 475900 2023-11-27 18:01:01,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3172653.3333333335, ans=0.0 2023-11-27 18:01:11,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3172720.0, ans=0.2 2023-11-27 18:01:15,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3172720.0, ans=0.015 2023-11-27 18:01:24,339 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.513e+01 8.751e+01 9.118e+01 9.607e+01 1.229e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-27 18:01:30,640 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3172786.6666666665, ans=0.1 2023-11-27 18:01:33,720 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 7000, loss[loss=0.06791, simple_loss=0.09709, pruned_loss=0.01255, audio_tagging_loss=0.006814, over 14969.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.09015, pruned_loss=0.0125, audio_tagging_loss=0.008693, over 3037687.45 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:01:35,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3172853.3333333335, ans=0.95 2023-11-27 18:01:39,063 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.15 vs. limit=6.0 2023-11-27 18:01:56,661 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 475950 2023-11-27 18:02:30,896 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 7050, loss[loss=0.07392, simple_loss=0.1112, pruned_loss=0.01194, audio_tagging_loss=0.006371, over 16914.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.09007, pruned_loss=0.01247, audio_tagging_loss=0.008707, over 3034731.46 frames. ], batch size: 63, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:02:39,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3173186.6666666665, ans=0.09899494936611666 2023-11-27 18:02:44,211 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.77 vs. limit=22.5 2023-11-27 18:02:54,232 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 476000 2023-11-27 18:02:55,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3173320.0, ans=0.125 2023-11-27 18:03:03,762 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.83 vs. limit=15.0 2023-11-27 18:03:21,881 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.448e+01 8.705e+01 9.232e+01 9.917e+01 1.412e+02, threshold=1.846e+02, percent-clipped=0.0 2023-11-27 18:03:31,261 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 7100, loss[loss=0.08752, simple_loss=0.1184, pruned_loss=0.02136, audio_tagging_loss=0.00698, over 14492.00 frames. ], tot_loss[loss=0.06675, simple_loss=0.09071, pruned_loss=0.01263, audio_tagging_loss=0.008759, over 3040303.79 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:03:54,152 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 476050 2023-11-27 18:04:03,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3173653.3333333335, ans=0.0 2023-11-27 18:04:11,839 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.11 vs. limit=15.0 2023-11-27 18:04:28,665 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 7150, loss[loss=0.06724, simple_loss=0.0936, pruned_loss=0.007963, audio_tagging_loss=0.01248, over 15107.00 frames. ], tot_loss[loss=0.06667, simple_loss=0.09057, pruned_loss=0.01246, audio_tagging_loss=0.008926, over 3040285.66 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:04:40,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3173920.0, ans=0.125 2023-11-27 18:04:47,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3173920.0, ans=0.0 2023-11-27 18:04:48,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3173920.0, ans=0.125 2023-11-27 18:04:51,747 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 476100 2023-11-27 18:05:09,649 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=3174053.3333333335, ans=10.0 2023-11-27 18:05:17,250 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.335e+01 8.847e+01 9.283e+01 1.002e+02 1.688e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-27 18:05:17,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3174120.0, ans=0.2 2023-11-27 18:05:22,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=3174120.0, ans=10.0 2023-11-27 18:05:22,381 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.36 vs. limit=15.0 2023-11-27 18:05:25,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3174186.6666666665, ans=0.125 2023-11-27 18:05:25,998 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 7200, loss[loss=0.05576, simple_loss=0.07739, pruned_loss=0.009542, audio_tagging_loss=0.007524, over 15444.00 frames. ], tot_loss[loss=0.06727, simple_loss=0.09144, pruned_loss=0.01267, audio_tagging_loss=0.008885, over 3055221.89 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 18:05:26,260 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3174186.6666666665, ans=0.125 2023-11-27 18:05:36,978 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.98 vs. limit=6.0 2023-11-27 18:05:44,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3174253.3333333335, ans=0.2 2023-11-27 18:05:47,708 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2023-11-27 18:05:49,090 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 476150 2023-11-27 18:06:20,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3174453.3333333335, ans=0.05 2023-11-27 18:06:23,129 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 7250, loss[loss=0.08724, simple_loss=0.1203, pruned_loss=0.01887, audio_tagging_loss=0.008223, over 15073.00 frames. ], tot_loss[loss=0.06747, simple_loss=0.09153, pruned_loss=0.01283, audio_tagging_loss=0.008871, over 3053465.95 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 18:06:43,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3174586.6666666665, ans=0.125 2023-11-27 18:06:46,775 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 476200 2023-11-27 18:06:54,863 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3174653.3333333335, ans=0.125 2023-11-27 18:07:09,085 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.92 vs. limit=22.5 2023-11-27 18:07:09,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3174786.6666666665, ans=0.0 2023-11-27 18:07:11,715 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.637e+01 8.649e+01 9.273e+01 9.853e+01 1.291e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-27 18:07:21,682 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 7300, loss[loss=0.06118, simple_loss=0.08866, pruned_loss=0.01022, audio_tagging_loss=0.006632, over 15170.00 frames. ], tot_loss[loss=0.06677, simple_loss=0.09051, pruned_loss=0.01263, audio_tagging_loss=0.008885, over 3052595.88 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 18:07:30,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3174853.3333333335, ans=0.125 2023-11-27 18:07:34,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3174920.0, ans=0.2 2023-11-27 18:07:44,991 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 476250 2023-11-27 18:08:00,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3175053.3333333335, ans=0.0 2023-11-27 18:08:06,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3175120.0, ans=0.125 2023-11-27 18:08:17,114 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3175120.0, ans=0.95 2023-11-27 18:08:19,046 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 7350, loss[loss=0.0604, simple_loss=0.08627, pruned_loss=0.01012, audio_tagging_loss=0.007143, over 14041.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.09052, pruned_loss=0.01263, audio_tagging_loss=0.008697, over 3045056.19 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:08:20,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3175186.6666666665, ans=0.125 2023-11-27 18:08:21,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3175186.6666666665, ans=0.07 2023-11-27 18:08:32,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3175253.3333333335, ans=0.1 2023-11-27 18:08:35,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3175253.3333333335, ans=0.125 2023-11-27 18:08:41,527 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 476300 2023-11-27 18:08:53,846 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.23 vs. limit=15.0 2023-11-27 18:09:08,064 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.365e+01 8.708e+01 9.249e+01 1.003e+02 1.493e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-27 18:09:12,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3175453.3333333335, ans=0.125 2023-11-27 18:09:15,716 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 7400, loss[loss=0.08219, simple_loss=0.1176, pruned_loss=0.0183, audio_tagging_loss=0.005103, over 14757.00 frames. ], tot_loss[loss=0.06687, simple_loss=0.09097, pruned_loss=0.01275, audio_tagging_loss=0.008635, over 3040175.21 frames. ], batch size: 53, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:09:17,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3175520.0, ans=0.125 2023-11-27 18:09:17,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3175520.0, ans=0.125 2023-11-27 18:09:18,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3175520.0, ans=0.1 2023-11-27 18:09:27,009 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3175586.6666666665, ans=0.125 2023-11-27 18:09:39,311 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 476350 2023-11-27 18:09:58,596 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3175720.0, ans=0.125 2023-11-27 18:10:09,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3175786.6666666665, ans=0.125 2023-11-27 18:10:12,903 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 7450, loss[loss=0.0722, simple_loss=0.1007, pruned_loss=0.01547, audio_tagging_loss=0.006363, over 14400.00 frames. ], tot_loss[loss=0.06682, simple_loss=0.09079, pruned_loss=0.01283, audio_tagging_loss=0.008598, over 3033959.32 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:10:32,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3175920.0, ans=0.1 2023-11-27 18:10:36,499 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 476400 2023-11-27 18:10:41,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3175986.6666666665, ans=0.125 2023-11-27 18:10:45,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3175986.6666666665, ans=0.0 2023-11-27 18:11:02,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3176120.0, ans=0.2 2023-11-27 18:11:02,900 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.342e+01 8.662e+01 9.277e+01 9.892e+01 1.175e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-27 18:11:11,066 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 7500, loss[loss=0.05427, simple_loss=0.07223, pruned_loss=0.007927, audio_tagging_loss=0.01023, over 15206.00 frames. ], tot_loss[loss=0.06685, simple_loss=0.09058, pruned_loss=0.01284, audio_tagging_loss=0.008719, over 3038558.28 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:11:33,551 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 476450 2023-11-27 18:11:57,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3176453.3333333335, ans=0.0 2023-11-27 18:12:05,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3176453.3333333335, ans=0.125 2023-11-27 18:12:08,307 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 7550, loss[loss=0.08849, simple_loss=0.1207, pruned_loss=0.01983, audio_tagging_loss=0.008301, over 15738.00 frames. ], tot_loss[loss=0.06693, simple_loss=0.09076, pruned_loss=0.01284, audio_tagging_loss=0.00871, over 3038061.80 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:12:31,239 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 476500 2023-11-27 18:12:45,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3176720.0, ans=0.0 2023-11-27 18:12:48,950 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3176720.0, ans=0.0 2023-11-27 18:12:57,547 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.867e+01 8.727e+01 9.587e+01 1.045e+02 1.317e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-27 18:12:59,021 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3176786.6666666665, ans=0.125 2023-11-27 18:13:01,110 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3176786.6666666665, ans=0.125 2023-11-27 18:13:05,312 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 7600, loss[loss=0.06198, simple_loss=0.0839, pruned_loss=0.01034, audio_tagging_loss=0.009691, over 14770.00 frames. ], tot_loss[loss=0.06698, simple_loss=0.09093, pruned_loss=0.01285, audio_tagging_loss=0.00866, over 3043846.91 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 18:13:07,115 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 18:13:16,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3176920.0, ans=0.125 2023-11-27 18:13:28,827 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 476550 2023-11-27 18:14:03,563 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 7650, loss[loss=0.05516, simple_loss=0.0757, pruned_loss=0.008332, audio_tagging_loss=0.008983, over 14477.00 frames. ], tot_loss[loss=0.06673, simple_loss=0.09049, pruned_loss=0.01283, audio_tagging_loss=0.00866, over 3044515.38 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 18:14:26,025 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 476600 2023-11-27 18:14:52,991 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.206e+01 8.590e+01 9.167e+01 9.909e+01 1.245e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-27 18:14:54,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3177453.3333333335, ans=0.125 2023-11-27 18:14:55,633 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.77 vs. limit=15.0 2023-11-27 18:15:01,115 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 7700, loss[loss=0.06003, simple_loss=0.08625, pruned_loss=0.008432, audio_tagging_loss=0.008472, over 15276.00 frames. ], tot_loss[loss=0.06699, simple_loss=0.09083, pruned_loss=0.0129, audio_tagging_loss=0.008673, over 3044376.25 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 18:15:01,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3177520.0, ans=0.0 2023-11-27 18:15:05,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3177520.0, ans=0.0 2023-11-27 18:15:10,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3177520.0, ans=0.0 2023-11-27 18:15:23,709 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 476650 2023-11-27 18:15:44,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3177720.0, ans=0.125 2023-11-27 18:15:57,831 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 7750, loss[loss=0.05476, simple_loss=0.07025, pruned_loss=0.009062, audio_tagging_loss=0.01058, over 15788.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.09008, pruned_loss=0.01278, audio_tagging_loss=0.008703, over 3042850.19 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 18:15:58,086 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 18:16:14,246 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3177920.0, ans=0.2 2023-11-27 18:16:18,035 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3177920.0, ans=0.125 2023-11-27 18:16:21,026 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 476700 2023-11-27 18:16:43,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3178120.0, ans=0.0 2023-11-27 18:16:48,182 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.922e+01 8.657e+01 9.352e+01 9.918e+01 1.309e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-27 18:16:54,648 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 7800, loss[loss=0.069, simple_loss=0.08883, pruned_loss=0.01308, audio_tagging_loss=0.0115, over 14641.00 frames. ], tot_loss[loss=0.06679, simple_loss=0.09041, pruned_loss=0.0129, audio_tagging_loss=0.008679, over 3037868.78 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:16:56,039 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3178186.6666666665, ans=0.0 2023-11-27 18:17:02,541 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3178186.6666666665, ans=0.0 2023-11-27 18:17:09,930 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.43 vs. limit=15.0 2023-11-27 18:17:11,815 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3178253.3333333335, ans=0.125 2023-11-27 18:17:15,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3178253.3333333335, ans=0.125 2023-11-27 18:17:16,309 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3178253.3333333335, ans=0.0 2023-11-27 18:17:18,165 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 476750 2023-11-27 18:17:23,703 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3178320.0, ans=0.0 2023-11-27 18:17:23,809 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3178320.0, ans=0.1 2023-11-27 18:17:32,945 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3178386.6666666665, ans=0.1 2023-11-27 18:17:34,577 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.51 vs. limit=10.0 2023-11-27 18:17:51,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3178520.0, ans=0.125 2023-11-27 18:17:53,008 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 7850, loss[loss=0.06239, simple_loss=0.07608, pruned_loss=0.01238, audio_tagging_loss=0.01197, over 15576.00 frames. ], tot_loss[loss=0.06653, simple_loss=0.08975, pruned_loss=0.01289, audio_tagging_loss=0.008765, over 3032909.84 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:17:56,992 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.52 vs. limit=22.5 2023-11-27 18:17:59,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3178520.0, ans=0.1 2023-11-27 18:18:03,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3178586.6666666665, ans=0.125 2023-11-27 18:18:15,353 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 476800 2023-11-27 18:18:23,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3178653.3333333335, ans=0.0 2023-11-27 18:18:27,223 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 18:18:28,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3178720.0, ans=0.0 2023-11-27 18:18:37,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3178720.0, ans=0.125 2023-11-27 18:18:44,726 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.473e+01 8.789e+01 9.389e+01 9.930e+01 1.229e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-27 18:18:48,107 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3178786.6666666665, ans=0.2 2023-11-27 18:18:50,077 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 7900, loss[loss=0.05913, simple_loss=0.06949, pruned_loss=0.01488, audio_tagging_loss=0.009501, over 14727.00 frames. ], tot_loss[loss=0.06648, simple_loss=0.08962, pruned_loss=0.01282, audio_tagging_loss=0.008842, over 3035128.20 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 8.0 2023-11-27 18:19:00,910 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.91 vs. limit=6.0 2023-11-27 18:19:06,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=3178920.0, ans=0.1 2023-11-27 18:19:13,031 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 476850 2023-11-27 18:19:42,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3179120.0, ans=0.125 2023-11-27 18:19:47,735 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 7950, loss[loss=0.06265, simple_loss=0.08668, pruned_loss=0.01048, audio_tagging_loss=0.008831, over 15181.00 frames. ], tot_loss[loss=0.06602, simple_loss=0.08869, pruned_loss=0.01269, audio_tagging_loss=0.008989, over 3037307.35 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 8.0 2023-11-27 18:19:54,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3179186.6666666665, ans=0.0 2023-11-27 18:20:05,984 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 18:20:11,389 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 476900 2023-11-27 18:20:37,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3179453.3333333335, ans=0.2 2023-11-27 18:20:39,449 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.286e+01 8.819e+01 9.346e+01 1.021e+02 1.484e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-27 18:20:44,963 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 8000, loss[loss=0.07007, simple_loss=0.1007, pruned_loss=0.01021, audio_tagging_loss=0.009494, over 16600.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.08905, pruned_loss=0.0126, audio_tagging_loss=0.009006, over 3037517.00 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:20:56,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3179586.6666666665, ans=0.125 2023-11-27 18:21:00,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3179586.6666666665, ans=0.1 2023-11-27 18:21:07,173 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3179653.3333333335, ans=0.125 2023-11-27 18:21:08,516 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 476950 2023-11-27 18:21:16,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3179653.3333333335, ans=0.1 2023-11-27 18:21:25,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3179720.0, ans=0.1 2023-11-27 18:21:32,950 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.68 vs. limit=15.0 2023-11-27 18:21:36,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3179786.6666666665, ans=0.0 2023-11-27 18:21:36,045 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3179786.6666666665, ans=0.125 2023-11-27 18:21:42,507 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 8050, loss[loss=0.06349, simple_loss=0.08384, pruned_loss=0.01309, audio_tagging_loss=0.008474, over 13885.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.08992, pruned_loss=0.01258, audio_tagging_loss=0.008978, over 3043835.18 frames. ], batch size: 54, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:21:47,039 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3179853.3333333335, ans=0.95 2023-11-27 18:22:05,358 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 477000 2023-11-27 18:22:25,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3180053.3333333335, ans=0.2 2023-11-27 18:22:29,901 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3180120.0, ans=0.125 2023-11-27 18:22:35,021 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.485e+01 8.539e+01 9.239e+01 9.821e+01 1.214e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-27 18:22:39,955 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 8100, loss[loss=0.04859, simple_loss=0.05995, pruned_loss=0.008915, audio_tagging_loss=0.009702, over 16377.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.08963, pruned_loss=0.01237, audio_tagging_loss=0.009025, over 3043204.02 frames. ], batch size: 64, lr: 1.69e-03, grad_scale: 8.0 2023-11-27 18:22:45,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3180186.6666666665, ans=0.0 2023-11-27 18:22:55,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3180253.3333333335, ans=0.125 2023-11-27 18:23:03,612 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 477050 2023-11-27 18:23:36,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3180520.0, ans=0.1 2023-11-27 18:23:36,915 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 8150, loss[loss=0.05826, simple_loss=0.0762, pruned_loss=0.009098, audio_tagging_loss=0.01107, over 16424.00 frames. ], tot_loss[loss=0.06714, simple_loss=0.09125, pruned_loss=0.01259, audio_tagging_loss=0.008923, over 3052162.01 frames. ], batch size: 63, lr: 1.69e-03, grad_scale: 8.0 2023-11-27 18:24:00,070 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 477100 2023-11-27 18:24:25,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3180786.6666666665, ans=0.1 2023-11-27 18:24:29,739 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.221e+01 8.482e+01 9.156e+01 9.778e+01 1.274e+02, threshold=1.831e+02, percent-clipped=0.0 2023-11-27 18:24:34,789 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 8200, loss[loss=0.06722, simple_loss=0.0906, pruned_loss=0.01255, audio_tagging_loss=0.009363, over 16436.00 frames. ], tot_loss[loss=0.06713, simple_loss=0.09131, pruned_loss=0.01266, audio_tagging_loss=0.00882, over 3048624.35 frames. ], batch size: 60, lr: 1.69e-03, grad_scale: 8.0 2023-11-27 18:24:39,148 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 18:24:51,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3180920.0, ans=0.0 2023-11-27 18:24:56,937 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 477150 2023-11-27 18:25:05,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3180986.6666666665, ans=0.125 2023-11-27 18:25:31,877 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 8250, loss[loss=0.05099, simple_loss=0.07239, pruned_loss=0.006118, audio_tagging_loss=0.008672, over 15273.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.0892, pruned_loss=0.01226, audio_tagging_loss=0.008741, over 3050460.44 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 8.0 2023-11-27 18:25:33,354 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3181186.6666666665, ans=0.0 2023-11-27 18:25:51,482 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.43 vs. limit=15.0 2023-11-27 18:25:54,898 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 477200 2023-11-27 18:25:59,837 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3181320.0, ans=0.1 2023-11-27 18:26:04,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3181320.0, ans=0.0 2023-11-27 18:26:24,494 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.486e+01 8.573e+01 9.130e+01 1.008e+02 1.998e+02, threshold=1.826e+02, percent-clipped=1.0 2023-11-27 18:26:28,427 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3181520.0, ans=0.125 2023-11-27 18:26:29,301 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 8300, loss[loss=0.08528, simple_loss=0.1203, pruned_loss=0.0181, audio_tagging_loss=0.007057, over 15348.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.0891, pruned_loss=0.01224, audio_tagging_loss=0.008736, over 3049162.60 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 8.0 2023-11-27 18:26:52,232 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 477250 2023-11-27 18:27:17,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3181786.6666666665, ans=0.2 2023-11-27 18:27:24,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3181786.6666666665, ans=0.125 2023-11-27 18:27:26,437 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 8350, loss[loss=0.0517, simple_loss=0.06355, pruned_loss=0.009207, audio_tagging_loss=0.01072, over 14766.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08897, pruned_loss=0.01231, audio_tagging_loss=0.00883, over 3047251.79 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 8.0 2023-11-27 18:27:41,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3181920.0, ans=0.125 2023-11-27 18:27:46,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3181920.0, ans=0.0 2023-11-27 18:27:49,285 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 477300 2023-11-27 18:28:08,921 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3182053.3333333335, ans=0.0 2023-11-27 18:28:19,113 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.531e+01 8.748e+01 9.541e+01 1.013e+02 1.320e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-27 18:28:21,381 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3182120.0, ans=0.0 2023-11-27 18:28:23,395 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 8400, loss[loss=0.06459, simple_loss=0.09137, pruned_loss=0.009548, audio_tagging_loss=0.009353, over 15294.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08863, pruned_loss=0.01217, audio_tagging_loss=0.008869, over 3051332.90 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:28:37,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3182253.3333333335, ans=0.0 2023-11-27 18:28:46,908 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 477350 2023-11-27 18:29:05,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3182386.6666666665, ans=0.2 2023-11-27 18:29:20,940 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 8450, loss[loss=0.05161, simple_loss=0.06225, pruned_loss=0.01034, audio_tagging_loss=0.01015, over 14227.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08901, pruned_loss=0.01224, audio_tagging_loss=0.008885, over 3048637.52 frames. ], batch size: 54, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:29:23,362 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3182520.0, ans=0.0 2023-11-27 18:29:25,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3182520.0, ans=0.125 2023-11-27 18:29:36,029 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3182586.6666666665, ans=0.125 2023-11-27 18:29:43,537 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 477400 2023-11-27 18:29:51,259 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3182653.3333333335, ans=0.0 2023-11-27 18:29:55,091 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 18:30:10,929 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3182786.6666666665, ans=0.125 2023-11-27 18:30:13,796 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.259e+01 8.652e+01 9.208e+01 1.012e+02 1.151e+02, threshold=1.842e+02, percent-clipped=0.0 2023-11-27 18:30:18,896 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 8500, loss[loss=0.06067, simple_loss=0.08105, pruned_loss=0.0118, audio_tagging_loss=0.008339, over 15698.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.09035, pruned_loss=0.01258, audio_tagging_loss=0.008823, over 3051235.07 frames. ], batch size: 62, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:30:36,640 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3182920.0, ans=0.1 2023-11-27 18:30:37,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3182920.0, ans=0.1 2023-11-27 18:30:42,042 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 477450 2023-11-27 18:30:44,240 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3182986.6666666665, ans=0.2 2023-11-27 18:30:47,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3182986.6666666665, ans=0.125 2023-11-27 18:30:49,185 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3182986.6666666665, ans=0.125 2023-11-27 18:30:54,499 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3183053.3333333335, ans=0.0 2023-11-27 18:31:16,558 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 8550, loss[loss=0.07035, simple_loss=0.1017, pruned_loss=0.01264, audio_tagging_loss=0.006839, over 14836.00 frames. ], tot_loss[loss=0.06662, simple_loss=0.09055, pruned_loss=0.01258, audio_tagging_loss=0.008754, over 3051615.76 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:31:24,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3183186.6666666665, ans=0.1 2023-11-27 18:31:36,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3183253.3333333335, ans=0.125 2023-11-27 18:31:39,301 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 477500 2023-11-27 18:31:45,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3183320.0, ans=0.0 2023-11-27 18:31:48,676 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3183320.0, ans=0.1 2023-11-27 18:31:59,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3183386.6666666665, ans=0.125 2023-11-27 18:32:09,081 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.354e+01 8.725e+01 9.304e+01 1.021e+02 1.373e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-27 18:32:13,965 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 8600, loss[loss=0.05685, simple_loss=0.08471, pruned_loss=0.006219, audio_tagging_loss=0.008276, over 15624.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.09011, pruned_loss=0.01232, audio_tagging_loss=0.008779, over 3044961.09 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:32:25,601 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 18:32:36,465 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 477550 2023-11-27 18:32:44,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3183653.3333333335, ans=0.125 2023-11-27 18:32:47,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3183720.0, ans=0.125 2023-11-27 18:33:03,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3183786.6666666665, ans=0.2 2023-11-27 18:33:11,372 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 8650, loss[loss=0.04144, simple_loss=0.05331, pruned_loss=0.00558, audio_tagging_loss=0.009208, over 13734.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.09001, pruned_loss=0.01236, audio_tagging_loss=0.008818, over 3042720.79 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:33:21,592 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.28 vs. limit=15.0 2023-11-27 18:33:23,316 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3183920.0, ans=0.0 2023-11-27 18:33:26,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3183920.0, ans=0.125 2023-11-27 18:33:27,801 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 18:33:28,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3183920.0, ans=0.0 2023-11-27 18:33:32,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3183920.0, ans=0.1 2023-11-27 18:33:33,605 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.53 vs. limit=10.0 2023-11-27 18:33:34,142 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 477600 2023-11-27 18:34:04,085 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.582e+01 8.946e+01 9.500e+01 1.005e+02 1.406e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-27 18:34:04,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3184120.0, ans=0.07 2023-11-27 18:34:08,470 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 8700, loss[loss=0.04925, simple_loss=0.07036, pruned_loss=0.005153, audio_tagging_loss=0.008916, over 15679.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.09016, pruned_loss=0.01243, audio_tagging_loss=0.008898, over 3047106.13 frames. ], batch size: 62, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:34:08,789 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3184186.6666666665, ans=0.125 2023-11-27 18:34:23,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3184253.3333333335, ans=0.125 2023-11-27 18:34:27,223 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.03 vs. limit=10.0 2023-11-27 18:34:31,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3184320.0, ans=0.0 2023-11-27 18:34:31,884 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 477650 2023-11-27 18:34:55,638 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.98 vs. limit=15.0 2023-11-27 18:35:05,937 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 8750, loss[loss=0.07446, simple_loss=0.09741, pruned_loss=0.01835, audio_tagging_loss=0.007405, over 15610.00 frames. ], tot_loss[loss=0.06663, simple_loss=0.09035, pruned_loss=0.01253, audio_tagging_loss=0.008922, over 3036173.80 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:35:28,788 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 477700 2023-11-27 18:35:35,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3184653.3333333335, ans=0.2 2023-11-27 18:35:38,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3184720.0, ans=0.125 2023-11-27 18:35:53,827 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.33 vs. limit=15.0 2023-11-27 18:35:58,763 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.523e+01 8.815e+01 9.414e+01 9.987e+01 1.374e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-27 18:36:03,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3184853.3333333335, ans=0.125 2023-11-27 18:36:03,875 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 8800, loss[loss=0.05971, simple_loss=0.08354, pruned_loss=0.009525, audio_tagging_loss=0.008414, over 15916.00 frames. ], tot_loss[loss=0.06711, simple_loss=0.09064, pruned_loss=0.01278, audio_tagging_loss=0.009009, over 3043709.95 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 18:36:14,939 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 18:36:26,065 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 477750 2023-11-27 18:36:26,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3184986.6666666665, ans=0.125 2023-11-27 18:36:37,522 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.85 vs. limit=15.0 2023-11-27 18:36:39,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3185053.3333333335, ans=0.125 2023-11-27 18:36:42,494 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3185053.3333333335, ans=0.125 2023-11-27 18:37:00,079 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 8850, loss[loss=0.08945, simple_loss=0.1063, pruned_loss=0.02352, audio_tagging_loss=0.01279, over 16149.00 frames. ], tot_loss[loss=0.0672, simple_loss=0.09086, pruned_loss=0.01278, audio_tagging_loss=0.00899, over 3045139.54 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:37:13,497 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.13 vs. limit=6.0 2023-11-27 18:37:14,846 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 18:37:16,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3185253.3333333335, ans=0.125 2023-11-27 18:37:23,588 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 477800 2023-11-27 18:37:24,387 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.11 vs. limit=22.5 2023-11-27 18:37:26,330 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3185320.0, ans=0.125 2023-11-27 18:37:27,377 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3185320.0, ans=0.025 2023-11-27 18:37:27,610 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.64 vs. limit=15.0 2023-11-27 18:37:54,288 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.612e+01 8.632e+01 9.430e+01 1.040e+02 1.292e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-27 18:37:57,543 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 8900, loss[loss=0.06315, simple_loss=0.08063, pruned_loss=0.01487, audio_tagging_loss=0.007971, over 15315.00 frames. ], tot_loss[loss=0.06707, simple_loss=0.09115, pruned_loss=0.01272, audio_tagging_loss=0.008773, over 3047555.04 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:38:07,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3185520.0, ans=0.1 2023-11-27 18:38:08,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3185586.6666666665, ans=0.0 2023-11-27 18:38:19,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3185653.3333333335, ans=0.125 2023-11-27 18:38:20,414 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 477850 2023-11-27 18:38:26,632 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.98 vs. limit=6.0 2023-11-27 18:38:29,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3185653.3333333335, ans=0.125 2023-11-27 18:38:34,110 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.92 vs. limit=22.5 2023-11-27 18:38:51,713 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.46 vs. limit=22.5 2023-11-27 18:38:53,522 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3185853.3333333335, ans=0.0 2023-11-27 18:38:54,431 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 8950, loss[loss=0.0804, simple_loss=0.1126, pruned_loss=0.01563, audio_tagging_loss=0.008484, over 16599.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.09011, pruned_loss=0.01251, audio_tagging_loss=0.008718, over 3053716.73 frames. ], batch size: 62, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:39:16,879 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 477900 2023-11-27 18:39:31,567 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.40 vs. limit=22.5 2023-11-27 18:39:49,550 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.830e+01 8.925e+01 9.376e+01 9.837e+01 1.193e+02, threshold=1.875e+02, percent-clipped=0.0 2023-11-27 18:39:51,780 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 9000, loss[loss=0.06863, simple_loss=0.08982, pruned_loss=0.01544, audio_tagging_loss=0.00828, over 14369.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.08952, pruned_loss=0.01247, audio_tagging_loss=0.008633, over 3047902.86 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 8.0 2023-11-27 18:39:51,781 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-27 18:40:17,653 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.0983, 2.4017, 5.0010, 2.8506], device='cuda:1') 2023-11-27 18:40:27,258 INFO [train_asr.py:1267] (1/4) Epoch 40, validation: loss=0.05837, simple_loss=0.05058, pruned_loss=0.005173, audio_tagging_loss=0.02791, over 4681554.00 frames. 2023-11-27 18:40:27,258 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-27 18:40:50,084 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 477950 2023-11-27 18:41:00,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff2.min_abs, batch_count=3186386.6666666665, ans=0.1 2023-11-27 18:41:06,618 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.79 vs. limit=6.0 2023-11-27 18:41:16,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3186453.3333333335, ans=0.125 2023-11-27 18:41:19,687 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.99 vs. limit=12.0 2023-11-27 18:41:25,260 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 9050, loss[loss=0.06159, simple_loss=0.08838, pruned_loss=0.009338, audio_tagging_loss=0.008063, over 15165.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08925, pruned_loss=0.01238, audio_tagging_loss=0.008636, over 3049445.29 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 4.0 2023-11-27 18:41:33,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3186520.0, ans=0.125 2023-11-27 18:41:45,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3186586.6666666665, ans=0.125 2023-11-27 18:41:46,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3186653.3333333335, ans=0.125 2023-11-27 18:41:47,791 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 478000 2023-11-27 18:42:19,593 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3186786.6666666665, ans=0.0 2023-11-27 18:42:21,512 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.670e+01 8.889e+01 9.370e+01 1.013e+02 1.191e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-27 18:42:22,193 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.26 vs. limit=22.5 2023-11-27 18:42:22,734 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 9100, loss[loss=0.06972, simple_loss=0.11, pruned_loss=0.009356, audio_tagging_loss=0.005373, over 15159.00 frames. ], tot_loss[loss=0.06588, simple_loss=0.08975, pruned_loss=0.01243, audio_tagging_loss=0.008573, over 3059136.57 frames. ], batch size: 54, lr: 1.69e-03, grad_scale: 8.0 2023-11-27 18:42:40,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3186920.0, ans=0.125 2023-11-27 18:42:45,868 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 478050 2023-11-27 18:43:12,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3187120.0, ans=0.2 2023-11-27 18:43:18,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3187120.0, ans=0.125 2023-11-27 18:43:20,526 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 9150, loss[loss=0.06036, simple_loss=0.08015, pruned_loss=0.01174, audio_tagging_loss=0.008549, over 16305.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.08996, pruned_loss=0.01243, audio_tagging_loss=0.008531, over 3055909.00 frames. ], batch size: 63, lr: 1.68e-03, grad_scale: 8.0 2023-11-27 18:43:39,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3187253.3333333335, ans=0.0 2023-11-27 18:43:44,098 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 478100 2023-11-27 18:44:17,239 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.460e+01 8.554e+01 9.287e+01 9.975e+01 1.548e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-27 18:44:18,384 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 9200, loss[loss=0.07099, simple_loss=0.1036, pruned_loss=0.01422, audio_tagging_loss=0.004944, over 15378.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.09063, pruned_loss=0.01254, audio_tagging_loss=0.008503, over 3053225.84 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 18:44:22,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3187520.0, ans=0.125 2023-11-27 18:44:30,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3187586.6666666665, ans=0.125 2023-11-27 18:44:34,468 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 18:44:41,023 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 478150 2023-11-27 18:44:57,640 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3187720.0, ans=0.05 2023-11-27 18:44:58,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3187720.0, ans=0.125 2023-11-27 18:45:07,290 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 18:45:15,784 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 9250, loss[loss=0.06116, simple_loss=0.08199, pruned_loss=0.01241, audio_tagging_loss=0.007756, over 14887.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.08999, pruned_loss=0.01236, audio_tagging_loss=0.008625, over 3056374.07 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 18:45:36,461 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.76 vs. limit=15.0 2023-11-27 18:45:38,992 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 478200 2023-11-27 18:45:48,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3187986.6666666665, ans=0.0 2023-11-27 18:46:00,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3188053.3333333335, ans=0.0 2023-11-27 18:46:12,564 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.297e+01 8.841e+01 9.296e+01 9.979e+01 1.330e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-27 18:46:12,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3188186.6666666665, ans=0.125 2023-11-27 18:46:13,730 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 9300, loss[loss=0.05624, simple_loss=0.08083, pruned_loss=0.008484, audio_tagging_loss=0.007346, over 15132.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.09017, pruned_loss=0.01254, audio_tagging_loss=0.00859, over 3054382.43 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 18:46:13,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3188186.6666666665, ans=0.0 2023-11-27 18:46:21,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3188186.6666666665, ans=0.5 2023-11-27 18:46:23,617 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.43 vs. limit=15.0 2023-11-27 18:46:26,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3188253.3333333335, ans=0.125 2023-11-27 18:46:29,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3188253.3333333335, ans=0.0 2023-11-27 18:46:37,508 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 478250 2023-11-27 18:46:47,831 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3188386.6666666665, ans=0.0 2023-11-27 18:46:53,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3188386.6666666665, ans=0.0 2023-11-27 18:47:02,950 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3188453.3333333335, ans=0.0 2023-11-27 18:47:10,944 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.62 vs. limit=6.0 2023-11-27 18:47:11,351 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 9350, loss[loss=0.05986, simple_loss=0.07672, pruned_loss=0.01216, audio_tagging_loss=0.009342, over 15328.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.09036, pruned_loss=0.01253, audio_tagging_loss=0.008665, over 3055796.69 frames. ], batch size: 61, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 18:47:23,831 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3188586.6666666665, ans=0.125 2023-11-27 18:47:34,513 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 478300 2023-11-27 18:47:43,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3188653.3333333335, ans=0.125 2023-11-27 18:47:55,212 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3188720.0, ans=0.1 2023-11-27 18:48:04,504 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3188786.6666666665, ans=0.015 2023-11-27 18:48:08,494 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.439e+01 8.625e+01 9.314e+01 1.018e+02 1.859e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-27 18:48:09,691 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 9400, loss[loss=0.06257, simple_loss=0.08701, pruned_loss=0.01059, audio_tagging_loss=0.008477, over 16288.00 frames. ], tot_loss[loss=0.06691, simple_loss=0.09093, pruned_loss=0.01269, audio_tagging_loss=0.008757, over 3060403.37 frames. ], batch size: 62, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 18:48:30,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3188920.0, ans=0.2 2023-11-27 18:48:32,788 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 478350 2023-11-27 18:48:40,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3188986.6666666665, ans=0.125 2023-11-27 18:49:07,280 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 9450, loss[loss=0.08224, simple_loss=0.1159, pruned_loss=0.01473, audio_tagging_loss=0.009579, over 15368.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.09036, pruned_loss=0.0125, audio_tagging_loss=0.008876, over 3060312.51 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 18:49:08,443 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 18:49:15,735 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3189186.6666666665, ans=0.125 2023-11-27 18:49:27,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3189253.3333333335, ans=0.1 2023-11-27 18:49:30,395 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 478400 2023-11-27 18:49:31,866 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.02 vs. limit=12.0 2023-11-27 18:49:34,368 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.57 vs. limit=15.0 2023-11-27 18:49:44,585 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3189386.6666666665, ans=0.1 2023-11-27 18:49:56,838 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3189453.3333333335, ans=0.125 2023-11-27 18:49:58,383 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3189453.3333333335, ans=0.1 2023-11-27 18:50:04,748 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.642e+01 8.634e+01 9.375e+01 9.974e+01 1.335e+02, threshold=1.875e+02, percent-clipped=0.0 2023-11-27 18:50:04,774 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 9500, loss[loss=0.04703, simple_loss=0.0659, pruned_loss=0.005701, audio_tagging_loss=0.008379, over 16499.00 frames. ], tot_loss[loss=0.06672, simple_loss=0.09054, pruned_loss=0.01249, audio_tagging_loss=0.008961, over 3053872.06 frames. ], batch size: 64, lr: 1.68e-03, grad_scale: 8.0 2023-11-27 18:50:07,141 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 18:50:24,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3189586.6666666665, ans=0.1 2023-11-27 18:50:28,199 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 478450 2023-11-27 18:50:33,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3189653.3333333335, ans=0.125 2023-11-27 18:50:35,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3189653.3333333335, ans=0.1 2023-11-27 18:50:45,784 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3189720.0, ans=0.0 2023-11-27 18:50:59,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3189786.6666666665, ans=0.125 2023-11-27 18:51:00,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3189786.6666666665, ans=0.125 2023-11-27 18:51:02,305 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 9550, loss[loss=0.07055, simple_loss=0.09854, pruned_loss=0.01411, audio_tagging_loss=0.007167, over 14374.00 frames. ], tot_loss[loss=0.06662, simple_loss=0.09042, pruned_loss=0.01243, audio_tagging_loss=0.008977, over 3055726.46 frames. ], batch size: 55, lr: 1.68e-03, grad_scale: 8.0 2023-11-27 18:51:07,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3189853.3333333335, ans=0.0 2023-11-27 18:51:11,227 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.74 vs. limit=15.0 2023-11-27 18:51:13,296 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3189920.0, ans=0.125 2023-11-27 18:51:26,124 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 478500 2023-11-27 18:51:48,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3190120.0, ans=0.125 2023-11-27 18:51:52,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3190120.0, ans=0.2 2023-11-27 18:51:59,905 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.041e+01 8.341e+01 9.036e+01 9.952e+01 1.407e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-27 18:51:59,931 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 9600, loss[loss=0.059, simple_loss=0.07547, pruned_loss=0.009504, audio_tagging_loss=0.01176, over 14788.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.09043, pruned_loss=0.01241, audio_tagging_loss=0.008984, over 3058498.76 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 18:52:23,577 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 478550 2023-11-27 18:52:42,753 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.92 vs. limit=15.0 2023-11-27 18:52:57,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3190520.0, ans=0.125 2023-11-27 18:52:58,203 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 9650, loss[loss=0.07134, simple_loss=0.09408, pruned_loss=0.01752, audio_tagging_loss=0.006786, over 15937.00 frames. ], tot_loss[loss=0.0672, simple_loss=0.09122, pruned_loss=0.01269, audio_tagging_loss=0.008895, over 3054658.04 frames. ], batch size: 60, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 18:53:02,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3190520.0, ans=0.125 2023-11-27 18:53:03,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.whiten.whitening_limit, batch_count=3190520.0, ans=12.0 2023-11-27 18:53:09,900 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3190586.6666666665, ans=0.035 2023-11-27 18:53:11,493 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.50 vs. limit=22.5 2023-11-27 18:53:12,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3190586.6666666665, ans=0.125 2023-11-27 18:53:20,779 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 478600 2023-11-27 18:53:35,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3190720.0, ans=0.125 2023-11-27 18:53:39,612 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.73 vs. limit=6.0 2023-11-27 18:53:55,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3190853.3333333335, ans=0.125 2023-11-27 18:53:55,994 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.441e+01 8.654e+01 9.418e+01 1.007e+02 1.330e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-27 18:53:56,019 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 9700, loss[loss=0.07476, simple_loss=0.1079, pruned_loss=0.01213, audio_tagging_loss=0.008694, over 15977.00 frames. ], tot_loss[loss=0.06683, simple_loss=0.09083, pruned_loss=0.01263, audio_tagging_loss=0.008779, over 3049450.51 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 18:53:59,802 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.81 vs. limit=15.0 2023-11-27 18:54:11,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3190920.0, ans=0.2 2023-11-27 18:54:14,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3190920.0, ans=0.0 2023-11-27 18:54:19,029 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 478650 2023-11-27 18:54:44,593 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3191120.0, ans=0.125 2023-11-27 18:54:52,953 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 9750, loss[loss=0.05686, simple_loss=0.07561, pruned_loss=0.01016, audio_tagging_loss=0.008901, over 15676.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.08995, pruned_loss=0.01246, audio_tagging_loss=0.008756, over 3042204.65 frames. ], batch size: 60, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 18:55:00,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3191186.6666666665, ans=0.125 2023-11-27 18:55:03,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=3191253.3333333335, ans=0.05 2023-11-27 18:55:16,974 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 478700 2023-11-27 18:55:33,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3191386.6666666665, ans=0.1 2023-11-27 18:55:50,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3191520.0, ans=0.0 2023-11-27 18:55:51,102 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.383e+01 8.742e+01 9.201e+01 9.783e+01 1.182e+02, threshold=1.840e+02, percent-clipped=0.0 2023-11-27 18:55:51,128 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 9800, loss[loss=0.07093, simple_loss=0.09886, pruned_loss=0.01146, audio_tagging_loss=0.01003, over 15437.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08938, pruned_loss=0.01222, audio_tagging_loss=0.008725, over 3043591.38 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 18:55:56,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3191520.0, ans=0.1 2023-11-27 18:56:03,411 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.85 vs. limit=15.0 2023-11-27 18:56:13,844 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 478750 2023-11-27 18:56:24,340 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 18:56:45,448 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 18:56:48,640 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 9850, loss[loss=0.06948, simple_loss=0.09257, pruned_loss=0.01516, audio_tagging_loss=0.008036, over 16325.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.09001, pruned_loss=0.01241, audio_tagging_loss=0.008673, over 3047042.49 frames. ], batch size: 61, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 18:57:01,230 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.46 vs. limit=15.0 2023-11-27 18:57:01,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3191920.0, ans=0.0 2023-11-27 18:57:05,646 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.57 vs. limit=6.0 2023-11-27 18:57:11,562 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 478800 2023-11-27 18:57:30,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3192053.3333333335, ans=0.1 2023-11-27 18:57:34,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3192120.0, ans=0.125 2023-11-27 18:57:45,661 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.221e+01 8.871e+01 9.511e+01 1.009e+02 1.336e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-27 18:57:45,687 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 9900, loss[loss=0.07647, simple_loss=0.1053, pruned_loss=0.01491, audio_tagging_loss=0.008898, over 15381.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.09023, pruned_loss=0.01243, audio_tagging_loss=0.00863, over 3048181.38 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 18:58:06,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3192253.3333333335, ans=0.0 2023-11-27 18:58:07,646 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.33 vs. limit=22.5 2023-11-27 18:58:09,246 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 478850 2023-11-27 18:58:16,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3192320.0, ans=0.04949747468305833 2023-11-27 18:58:21,198 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.16 vs. limit=15.0 2023-11-27 18:58:43,970 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 9950, loss[loss=0.05592, simple_loss=0.07717, pruned_loss=0.008246, audio_tagging_loss=0.009089, over 15983.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.09053, pruned_loss=0.0125, audio_tagging_loss=0.008651, over 3046631.59 frames. ], batch size: 62, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 18:58:45,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3192520.0, ans=0.0 2023-11-27 18:58:47,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3192520.0, ans=0.125 2023-11-27 18:59:01,297 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3192586.6666666665, ans=0.0 2023-11-27 18:59:01,761 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.89 vs. limit=15.0 2023-11-27 18:59:06,654 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 478900 2023-11-27 18:59:21,814 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.74 vs. limit=15.0 2023-11-27 18:59:35,688 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3192786.6666666665, ans=0.2 2023-11-27 18:59:39,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3192786.6666666665, ans=0.1 2023-11-27 18:59:41,468 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.137e+01 8.506e+01 9.259e+01 9.823e+01 1.115e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-27 18:59:41,499 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 10000, loss[loss=0.07561, simple_loss=0.1097, pruned_loss=0.0143, audio_tagging_loss=0.006439, over 16310.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.08989, pruned_loss=0.01243, audio_tagging_loss=0.008653, over 3044537.44 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 32.0 2023-11-27 18:59:42,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3192853.3333333335, ans=0.2 2023-11-27 18:59:58,196 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3192920.0, ans=0.1 2023-11-27 19:00:00,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3192920.0, ans=0.125 2023-11-27 19:00:04,078 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 478950 2023-11-27 19:00:09,863 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3192986.6666666665, ans=0.04949747468305833 2023-11-27 19:00:18,124 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3193053.3333333335, ans=0.125 2023-11-27 19:00:20,489 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.71 vs. limit=15.0 2023-11-27 19:00:38,035 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 10050, loss[loss=0.06202, simple_loss=0.08723, pruned_loss=0.008953, audio_tagging_loss=0.009446, over 16114.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.09032, pruned_loss=0.01253, audio_tagging_loss=0.008626, over 3049569.77 frames. ], batch size: 59, lr: 1.68e-03, grad_scale: 32.0 2023-11-27 19:00:43,953 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3193186.6666666665, ans=0.125 2023-11-27 19:00:59,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3193253.3333333335, ans=0.125 2023-11-27 19:01:01,579 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 479000 2023-11-27 19:01:35,904 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 10100, loss[loss=0.07204, simple_loss=0.09492, pruned_loss=0.01573, audio_tagging_loss=0.00884, over 15391.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.09053, pruned_loss=0.01246, audio_tagging_loss=0.008611, over 3054816.26 frames. ], batch size: 58, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:01:36,987 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.570e+01 8.749e+01 9.301e+01 1.017e+02 1.197e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-27 19:01:59,535 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 479050 2023-11-27 19:02:17,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3193720.0, ans=0.125 2023-11-27 19:02:19,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3193720.0, ans=0.125 2023-11-27 19:02:24,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3193786.6666666665, ans=0.04949747468305833 2023-11-27 19:02:25,433 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 19:02:34,615 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 10150, loss[loss=0.08819, simple_loss=0.124, pruned_loss=0.01826, audio_tagging_loss=0.0079, over 15427.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.09034, pruned_loss=0.01245, audio_tagging_loss=0.008751, over 3055076.26 frames. ], batch size: 55, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:02:38,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3193853.3333333335, ans=0.07 2023-11-27 19:02:47,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3193920.0, ans=0.0 2023-11-27 19:02:48,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3193920.0, ans=0.125 2023-11-27 19:02:51,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3193920.0, ans=0.0 2023-11-27 19:02:56,491 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 479100 2023-11-27 19:02:56,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3193986.6666666665, ans=0.125 2023-11-27 19:03:00,840 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.14 vs. limit=15.0 2023-11-27 19:03:03,414 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 19:03:21,827 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3194120.0, ans=0.2 2023-11-27 19:03:23,122 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3194120.0, ans=0.125 2023-11-27 19:03:25,172 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3194120.0, ans=0.125 2023-11-27 19:03:26,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3194120.0, ans=0.1 2023-11-27 19:03:31,551 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 10200, loss[loss=0.08882, simple_loss=0.1201, pruned_loss=0.0219, audio_tagging_loss=0.006857, over 14567.00 frames. ], tot_loss[loss=0.0664, simple_loss=0.09045, pruned_loss=0.01236, audio_tagging_loss=0.00881, over 3051184.67 frames. ], batch size: 54, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:03:32,626 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.102e+01 8.681e+01 9.288e+01 9.961e+01 1.325e+02, threshold=1.858e+02, percent-clipped=0.0 2023-11-27 19:03:43,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3194253.3333333335, ans=0.125 2023-11-27 19:03:50,286 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3194253.3333333335, ans=0.1 2023-11-27 19:03:54,408 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 479150 2023-11-27 19:03:54,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3194320.0, ans=0.0 2023-11-27 19:03:57,187 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 19:04:04,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3194320.0, ans=0.0 2023-11-27 19:04:28,874 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 10250, loss[loss=0.07164, simple_loss=0.09426, pruned_loss=0.01583, audio_tagging_loss=0.008683, over 14670.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.09048, pruned_loss=0.01247, audio_tagging_loss=0.008857, over 3059304.25 frames. ], batch size: 53, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:04:42,629 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.92 vs. limit=6.0 2023-11-27 19:04:52,756 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 479200 2023-11-27 19:05:02,018 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3194653.3333333335, ans=0.0 2023-11-27 19:05:02,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3194653.3333333335, ans=0.0 2023-11-27 19:05:18,189 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3194786.6666666665, ans=0.2 2023-11-27 19:05:18,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3194786.6666666665, ans=0.2 2023-11-27 19:05:24,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3194786.6666666665, ans=0.125 2023-11-27 19:05:26,288 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.06 vs. limit=15.0 2023-11-27 19:05:27,509 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 10300, loss[loss=0.05406, simple_loss=0.07412, pruned_loss=0.007652, audio_tagging_loss=0.009352, over 15656.00 frames. ], tot_loss[loss=0.06706, simple_loss=0.0911, pruned_loss=0.01262, audio_tagging_loss=0.008887, over 3064130.30 frames. ], batch size: 61, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:05:28,536 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.569e+01 8.814e+01 9.491e+01 9.959e+01 1.329e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-27 19:05:36,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3194853.3333333335, ans=0.125 2023-11-27 19:05:42,846 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.38 vs. limit=15.0 2023-11-27 19:05:44,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3194920.0, ans=0.2 2023-11-27 19:05:45,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=3194920.0, ans=0.5 2023-11-27 19:05:46,898 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3194920.0, ans=0.0 2023-11-27 19:05:49,107 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3194986.6666666665, ans=0.1 2023-11-27 19:05:50,042 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 479250 2023-11-27 19:05:52,425 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 19:06:03,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3195053.3333333335, ans=0.0 2023-11-27 19:06:07,673 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.00 vs. limit=10.0 2023-11-27 19:06:16,893 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3195120.0, ans=0.1 2023-11-27 19:06:18,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3195120.0, ans=0.0 2023-11-27 19:06:19,602 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.11 vs. limit=15.0 2023-11-27 19:06:24,331 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 10350, loss[loss=0.07436, simple_loss=0.1046, pruned_loss=0.01272, audio_tagging_loss=0.009322, over 15317.00 frames. ], tot_loss[loss=0.06749, simple_loss=0.09168, pruned_loss=0.01271, audio_tagging_loss=0.008943, over 3061159.85 frames. ], batch size: 55, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:06:30,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3195186.6666666665, ans=0.125 2023-11-27 19:06:47,557 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 479300 2023-11-27 19:07:00,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3195386.6666666665, ans=0.125 2023-11-27 19:07:04,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3195386.6666666665, ans=0.1 2023-11-27 19:07:06,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3195386.6666666665, ans=0.1 2023-11-27 19:07:21,730 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 10400, loss[loss=0.05717, simple_loss=0.07417, pruned_loss=0.008893, audio_tagging_loss=0.01119, over 15096.00 frames. ], tot_loss[loss=0.06704, simple_loss=0.09082, pruned_loss=0.01261, audio_tagging_loss=0.009017, over 3057524.19 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:07:24,457 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.346e+01 8.831e+01 9.287e+01 1.004e+02 1.358e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-27 19:07:33,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3195586.6666666665, ans=0.07 2023-11-27 19:07:45,852 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 479350 2023-11-27 19:07:47,350 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.06 vs. limit=15.0 2023-11-27 19:08:00,864 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.21 vs. limit=22.5 2023-11-27 19:08:10,758 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3195786.6666666665, ans=0.125 2023-11-27 19:08:18,885 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3195853.3333333335, ans=0.04949747468305833 2023-11-27 19:08:19,754 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 10450, loss[loss=0.04606, simple_loss=0.05962, pruned_loss=0.005524, audio_tagging_loss=0.01073, over 15724.00 frames. ], tot_loss[loss=0.06702, simple_loss=0.09099, pruned_loss=0.01259, audio_tagging_loss=0.008938, over 3059098.20 frames. ], batch size: 59, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:08:26,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3195853.3333333335, ans=0.125 2023-11-27 19:08:28,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3195853.3333333335, ans=0.025 2023-11-27 19:08:37,037 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.76 vs. limit=15.0 2023-11-27 19:08:40,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3195920.0, ans=0.95 2023-11-27 19:08:43,082 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 479400 2023-11-27 19:09:03,607 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.98 vs. limit=15.0 2023-11-27 19:09:18,581 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 10500, loss[loss=0.07024, simple_loss=0.09505, pruned_loss=0.01407, audio_tagging_loss=0.008645, over 15212.00 frames. ], tot_loss[loss=0.06696, simple_loss=0.09084, pruned_loss=0.01267, audio_tagging_loss=0.008866, over 3055471.39 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:09:20,764 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.608e+01 8.582e+01 9.246e+01 1.004e+02 1.274e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-27 19:09:22,080 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3196186.6666666665, ans=0.125 2023-11-27 19:09:31,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3196253.3333333335, ans=0.1 2023-11-27 19:09:38,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3196253.3333333335, ans=0.0 2023-11-27 19:09:41,857 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 479450 2023-11-27 19:09:44,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3196320.0, ans=0.125 2023-11-27 19:09:53,777 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3196386.6666666665, ans=0.1 2023-11-27 19:09:55,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3196386.6666666665, ans=0.0 2023-11-27 19:10:03,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3196386.6666666665, ans=0.1 2023-11-27 19:10:16,046 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 10550, loss[loss=0.06718, simple_loss=0.08091, pruned_loss=0.01317, audio_tagging_loss=0.01355, over 15458.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.09014, pruned_loss=0.01243, audio_tagging_loss=0.008773, over 3051752.50 frames. ], batch size: 58, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:10:39,726 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 479500 2023-11-27 19:10:41,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3196653.3333333335, ans=0.125 2023-11-27 19:11:05,065 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.52 vs. limit=15.0 2023-11-27 19:11:11,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3196786.6666666665, ans=0.0 2023-11-27 19:11:13,704 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 10600, loss[loss=0.08167, simple_loss=0.1167, pruned_loss=0.01618, audio_tagging_loss=0.007148, over 15123.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.09021, pruned_loss=0.01251, audio_tagging_loss=0.00866, over 3054729.05 frames. ], batch size: 55, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:11:15,882 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.642e+01 8.628e+01 9.441e+01 1.014e+02 1.251e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-27 19:11:18,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff3.min_abs, batch_count=3196853.3333333335, ans=0.2 2023-11-27 19:11:22,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3196853.3333333335, ans=0.0 2023-11-27 19:11:30,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3196920.0, ans=0.0 2023-11-27 19:11:36,975 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 479550 2023-11-27 19:11:38,712 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.05 vs. limit=15.0 2023-11-27 19:11:43,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3196986.6666666665, ans=0.0 2023-11-27 19:12:01,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3197120.0, ans=0.05 2023-11-27 19:12:04,952 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3197120.0, ans=0.0 2023-11-27 19:12:09,382 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3197120.0, ans=0.1 2023-11-27 19:12:11,329 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 10650, loss[loss=0.07069, simple_loss=0.09836, pruned_loss=0.01262, audio_tagging_loss=0.008884, over 15344.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.09038, pruned_loss=0.01262, audio_tagging_loss=0.008562, over 3052792.97 frames. ], batch size: 58, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:12:18,105 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.07 vs. limit=15.0 2023-11-27 19:12:34,480 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 479600 2023-11-27 19:12:34,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3197320.0, ans=0.125 2023-11-27 19:12:50,018 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3197386.6666666665, ans=0.125 2023-11-27 19:13:09,081 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 10700, loss[loss=0.06346, simple_loss=0.09444, pruned_loss=0.009515, audio_tagging_loss=0.006732, over 14551.00 frames. ], tot_loss[loss=0.06646, simple_loss=0.09049, pruned_loss=0.01263, audio_tagging_loss=0.008589, over 3053297.95 frames. ], batch size: 55, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:13:11,198 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.023e+01 8.717e+01 9.252e+01 9.839e+01 1.176e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-27 19:13:32,704 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 479650 2023-11-27 19:13:48,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3197720.0, ans=0.05 2023-11-27 19:13:54,690 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.89 vs. limit=22.5 2023-11-27 19:14:06,955 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 10750, loss[loss=0.07552, simple_loss=0.103, pruned_loss=0.01533, audio_tagging_loss=0.008696, over 14156.00 frames. ], tot_loss[loss=0.06649, simple_loss=0.09066, pruned_loss=0.01262, audio_tagging_loss=0.008541, over 3051616.64 frames. ], batch size: 52, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:14:25,382 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3197920.0, ans=0.05 2023-11-27 19:14:29,514 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 479700 2023-11-27 19:14:41,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3198053.3333333335, ans=0.0 2023-11-27 19:15:04,573 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 10800, loss[loss=0.07235, simple_loss=0.09613, pruned_loss=0.01615, audio_tagging_loss=0.008139, over 15458.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.09011, pruned_loss=0.01257, audio_tagging_loss=0.008523, over 3048816.34 frames. ], batch size: 60, lr: 1.68e-03, grad_scale: 32.0 2023-11-27 19:15:04,731 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3198186.6666666665, ans=0.035 2023-11-27 19:15:05,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3198186.6666666665, ans=0.1 2023-11-27 19:15:06,816 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.572e+01 8.494e+01 9.274e+01 9.978e+01 1.190e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-27 19:15:27,740 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 479750 2023-11-27 19:16:02,284 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 10850, loss[loss=0.0777, simple_loss=0.1003, pruned_loss=0.01593, audio_tagging_loss=0.01161, over 15126.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.08969, pruned_loss=0.01239, audio_tagging_loss=0.008554, over 3052019.72 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 32.0 2023-11-27 19:16:25,208 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 479800 2023-11-27 19:16:39,052 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3198720.0, ans=10.0 2023-11-27 19:16:40,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3198720.0, ans=0.125 2023-11-27 19:16:45,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3198720.0, ans=0.04949747468305833 2023-11-27 19:16:49,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3198786.6666666665, ans=0.95 2023-11-27 19:16:57,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3198786.6666666665, ans=0.0 2023-11-27 19:17:00,004 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 10900, loss[loss=0.05647, simple_loss=0.07741, pruned_loss=0.006715, audio_tagging_loss=0.01105, over 15307.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.09008, pruned_loss=0.01245, audio_tagging_loss=0.008591, over 3049713.12 frames. ], batch size: 58, lr: 1.68e-03, grad_scale: 32.0 2023-11-27 19:17:00,042 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 19:17:01,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3198853.3333333335, ans=0.125 2023-11-27 19:17:02,226 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.580e+01 8.922e+01 9.500e+01 1.014e+02 1.176e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-27 19:17:13,341 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.09 vs. limit=15.0 2023-11-27 19:17:22,633 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 479850 2023-11-27 19:17:22,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3198986.6666666665, ans=0.07 2023-11-27 19:17:37,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=3199053.3333333335, ans=0.025 2023-11-27 19:17:52,969 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3199120.0, ans=0.2 2023-11-27 19:17:56,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3199186.6666666665, ans=0.95 2023-11-27 19:17:57,555 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 10950, loss[loss=0.06896, simple_loss=0.09406, pruned_loss=0.01462, audio_tagging_loss=0.00731, over 15179.00 frames. ], tot_loss[loss=0.06588, simple_loss=0.08955, pruned_loss=0.01245, audio_tagging_loss=0.008657, over 3054903.38 frames. ], batch size: 58, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:18:16,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3199253.3333333335, ans=0.125 2023-11-27 19:18:20,266 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 479900 2023-11-27 19:18:27,606 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3199320.0, ans=0.125 2023-11-27 19:18:33,645 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3199386.6666666665, ans=0.125 2023-11-27 19:18:33,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3199386.6666666665, ans=0.125 2023-11-27 19:18:54,494 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 11000, loss[loss=0.06023, simple_loss=0.07709, pruned_loss=0.01157, audio_tagging_loss=0.01011, over 13768.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.09026, pruned_loss=0.01254, audio_tagging_loss=0.008681, over 3057150.66 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:18:57,745 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.058e+01 8.669e+01 9.375e+01 1.024e+02 1.386e+02, threshold=1.875e+02, percent-clipped=0.0 2023-11-27 19:19:07,851 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 19:19:17,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3199653.3333333335, ans=0.07 2023-11-27 19:19:18,259 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 479950 2023-11-27 19:19:29,788 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3199720.0, ans=0.2 2023-11-27 19:19:51,854 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 11050, loss[loss=0.05593, simple_loss=0.07085, pruned_loss=0.01049, audio_tagging_loss=0.01001, over 14655.00 frames. ], tot_loss[loss=0.0663, simple_loss=0.08993, pruned_loss=0.01256, audio_tagging_loss=0.008768, over 3056033.10 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 8.0 2023-11-27 19:19:56,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3199853.3333333335, ans=0.1 2023-11-27 19:20:03,003 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.05 vs. limit=15.0 2023-11-27 19:20:10,280 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.10 vs. limit=6.0 2023-11-27 19:20:15,032 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 480000 2023-11-27 19:20:15,204 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3199986.6666666665, ans=0.0 2023-11-27 19:20:35,043 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.82 vs. limit=15.0 2023-11-27 19:20:39,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3200120.0, ans=0.125 2023-11-27 19:20:41,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3200120.0, ans=0.1 2023-11-27 19:20:47,751 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.74 vs. limit=6.0 2023-11-27 19:20:50,720 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.71 vs. limit=15.0 2023-11-27 19:20:51,374 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 11100, loss[loss=0.05789, simple_loss=0.08181, pruned_loss=0.009031, audio_tagging_loss=0.007949, over 14935.00 frames. ], tot_loss[loss=0.06685, simple_loss=0.09084, pruned_loss=0.01264, audio_tagging_loss=0.008795, over 3051615.68 frames. ], batch size: 54, lr: 1.68e-03, grad_scale: 8.0 2023-11-27 19:20:56,278 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.567e+01 8.813e+01 9.363e+01 1.015e+02 1.283e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-27 19:21:13,934 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 480050 2023-11-27 19:21:24,029 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3200320.0, ans=0.025 2023-11-27 19:21:37,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3200453.3333333335, ans=0.07 2023-11-27 19:21:39,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3200453.3333333335, ans=0.07 2023-11-27 19:21:44,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3200453.3333333335, ans=0.125 2023-11-27 19:21:46,615 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.02 vs. limit=15.0 2023-11-27 19:21:49,151 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 11150, loss[loss=0.06875, simple_loss=0.1002, pruned_loss=0.01128, audio_tagging_loss=0.007391, over 15368.00 frames. ], tot_loss[loss=0.06685, simple_loss=0.09056, pruned_loss=0.01267, audio_tagging_loss=0.008908, over 3047620.45 frames. ], batch size: 58, lr: 1.68e-03, grad_scale: 8.0 2023-11-27 19:21:49,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3200520.0, ans=0.125 2023-11-27 19:21:52,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3200520.0, ans=0.125 2023-11-27 19:22:00,671 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3200586.6666666665, ans=0.125 2023-11-27 19:22:12,516 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 480100 2023-11-27 19:22:27,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3200720.0, ans=0.125 2023-11-27 19:22:46,470 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 11200, loss[loss=0.06875, simple_loss=0.09158, pruned_loss=0.01396, audio_tagging_loss=0.008995, over 15946.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.09009, pruned_loss=0.0126, audio_tagging_loss=0.009019, over 3047561.32 frames. ], batch size: 58, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:22:51,510 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.792e+01 8.755e+01 9.520e+01 1.002e+02 1.290e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-27 19:22:58,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3200920.0, ans=0.2 2023-11-27 19:23:10,205 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 480150 2023-11-27 19:23:36,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3201120.0, ans=0.0 2023-11-27 19:23:44,370 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 11250, loss[loss=0.0612, simple_loss=0.08268, pruned_loss=0.01202, audio_tagging_loss=0.00784, over 15408.00 frames. ], tot_loss[loss=0.06648, simple_loss=0.08962, pruned_loss=0.01258, audio_tagging_loss=0.009093, over 3042144.82 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:23:45,694 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3201186.6666666665, ans=0.2 2023-11-27 19:23:45,910 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.06 vs. limit=6.0 2023-11-27 19:24:05,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3201253.3333333335, ans=0.125 2023-11-27 19:24:06,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_na.min_abs, batch_count=3201320.0, ans=0.02 2023-11-27 19:24:07,141 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 480200 2023-11-27 19:24:11,041 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.69 vs. limit=15.0 2023-11-27 19:24:36,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3201453.3333333335, ans=0.0 2023-11-27 19:24:42,367 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 11300, loss[loss=0.08979, simple_loss=0.1213, pruned_loss=0.01917, audio_tagging_loss=0.009956, over 15104.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.08982, pruned_loss=0.01279, audio_tagging_loss=0.008957, over 3040710.78 frames. ], batch size: 54, lr: 1.68e-03, grad_scale: 8.0 2023-11-27 19:24:47,805 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.989e+01 8.774e+01 9.523e+01 1.010e+02 1.222e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-27 19:24:51,728 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.30 vs. limit=10.0 2023-11-27 19:24:56,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3201586.6666666665, ans=0.0 2023-11-27 19:25:00,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3201586.6666666665, ans=0.0 2023-11-27 19:25:01,817 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.12 vs. limit=22.5 2023-11-27 19:25:04,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3201653.3333333335, ans=0.125 2023-11-27 19:25:05,805 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 480250 2023-11-27 19:25:27,781 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.55 vs. limit=15.0 2023-11-27 19:25:31,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3201786.6666666665, ans=0.1 2023-11-27 19:25:33,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3201786.6666666665, ans=0.025 2023-11-27 19:25:37,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3201786.6666666665, ans=0.125 2023-11-27 19:25:37,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3201786.6666666665, ans=0.125 2023-11-27 19:25:39,725 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 11350, loss[loss=0.07292, simple_loss=0.1005, pruned_loss=0.01244, audio_tagging_loss=0.01021, over 14733.00 frames. ], tot_loss[loss=0.06688, simple_loss=0.09036, pruned_loss=0.01286, audio_tagging_loss=0.008837, over 3045449.75 frames. ], batch size: 53, lr: 1.68e-03, grad_scale: 8.0 2023-11-27 19:25:51,888 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 19:25:52,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3201920.0, ans=0.125 2023-11-27 19:26:03,225 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 480300 2023-11-27 19:26:37,704 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 11400, loss[loss=0.06397, simple_loss=0.08612, pruned_loss=0.0112, audio_tagging_loss=0.009708, over 15787.00 frames. ], tot_loss[loss=0.06672, simple_loss=0.09004, pruned_loss=0.01283, audio_tagging_loss=0.008863, over 3041140.01 frames. ], batch size: 58, lr: 1.68e-03, grad_scale: 8.0 2023-11-27 19:26:39,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3202186.6666666665, ans=0.125 2023-11-27 19:26:43,650 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.557e+01 8.795e+01 9.431e+01 1.004e+02 1.426e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-27 19:27:00,047 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 480350 2023-11-27 19:27:17,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3202386.6666666665, ans=0.125 2023-11-27 19:27:21,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3202386.6666666665, ans=0.2 2023-11-27 19:27:24,240 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.81 vs. limit=6.0 2023-11-27 19:27:32,052 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3202453.3333333335, ans=0.125 2023-11-27 19:27:33,204 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3202453.3333333335, ans=0.1 2023-11-27 19:27:35,247 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 11450, loss[loss=0.07313, simple_loss=0.1096, pruned_loss=0.01268, audio_tagging_loss=0.00566, over 14973.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.08947, pruned_loss=0.01267, audio_tagging_loss=0.008812, over 3036563.61 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 8.0 2023-11-27 19:27:44,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3202520.0, ans=0.1 2023-11-27 19:27:51,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3202586.6666666665, ans=0.2 2023-11-27 19:27:57,796 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 480400 2023-11-27 19:28:02,043 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.47 vs. limit=22.5 2023-11-27 19:28:19,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3202720.0, ans=0.2 2023-11-27 19:28:31,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3202853.3333333335, ans=0.0 2023-11-27 19:28:32,568 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 11500, loss[loss=0.07728, simple_loss=0.1052, pruned_loss=0.01838, audio_tagging_loss=0.006293, over 15622.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.08944, pruned_loss=0.0125, audio_tagging_loss=0.008775, over 3036756.55 frames. ], batch size: 59, lr: 1.68e-03, grad_scale: 8.0 2023-11-27 19:28:38,530 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.326e+01 9.056e+01 9.508e+01 1.039e+02 1.307e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-27 19:28:49,178 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3202920.0, ans=0.1 2023-11-27 19:28:56,700 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 480450 2023-11-27 19:28:56,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3202986.6666666665, ans=0.0 2023-11-27 19:29:03,972 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.10 vs. limit=15.0 2023-11-27 19:29:08,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3203053.3333333335, ans=0.125 2023-11-27 19:29:20,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=3203120.0, ans=22.5 2023-11-27 19:29:24,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3203120.0, ans=0.0 2023-11-27 19:29:30,435 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 11550, loss[loss=0.05568, simple_loss=0.0784, pruned_loss=0.006675, audio_tagging_loss=0.00981, over 15550.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.08994, pruned_loss=0.01244, audio_tagging_loss=0.008776, over 3041027.80 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 8.0 2023-11-27 19:29:32,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3203186.6666666665, ans=0.0 2023-11-27 19:29:39,728 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.06 vs. limit=15.0 2023-11-27 19:29:42,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3203253.3333333335, ans=0.07 2023-11-27 19:29:46,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3203253.3333333335, ans=0.125 2023-11-27 19:29:50,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3203253.3333333335, ans=0.05 2023-11-27 19:29:53,591 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 480500 2023-11-27 19:29:57,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten.whitening_limit, batch_count=3203320.0, ans=22.5 2023-11-27 19:30:08,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3203386.6666666665, ans=10.0 2023-11-27 19:30:09,442 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 19:30:10,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3203386.6666666665, ans=0.07 2023-11-27 19:30:26,469 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3203453.3333333335, ans=0.125 2023-11-27 19:30:28,600 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 11600, loss[loss=0.05499, simple_loss=0.0746, pruned_loss=0.009678, audio_tagging_loss=0.008006, over 15273.00 frames. ], tot_loss[loss=0.06644, simple_loss=0.0907, pruned_loss=0.01242, audio_tagging_loss=0.008666, over 3054574.66 frames. ], batch size: 59, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:30:28,814 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3203520.0, ans=0.125 2023-11-27 19:30:28,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3203520.0, ans=0.125 2023-11-27 19:30:33,944 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.482e+01 8.753e+01 9.625e+01 1.023e+02 1.677e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-27 19:30:35,387 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3203520.0, ans=0.0 2023-11-27 19:30:50,957 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 480550 2023-11-27 19:30:53,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3203653.3333333335, ans=0.1 2023-11-27 19:31:21,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3203786.6666666665, ans=0.125 2023-11-27 19:31:24,675 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 11650, loss[loss=0.05595, simple_loss=0.07422, pruned_loss=0.01141, audio_tagging_loss=0.007431, over 15344.00 frames. ], tot_loss[loss=0.06663, simple_loss=0.09099, pruned_loss=0.01245, audio_tagging_loss=0.008682, over 3053818.78 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:31:29,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3203853.3333333335, ans=0.2 2023-11-27 19:31:40,364 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.83 vs. limit=15.0 2023-11-27 19:31:47,490 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 480600 2023-11-27 19:31:51,358 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3203986.6666666665, ans=0.0 2023-11-27 19:31:56,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3203986.6666666665, ans=0.125 2023-11-27 19:32:10,039 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3204120.0, ans=0.125 2023-11-27 19:32:14,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3204120.0, ans=0.125 2023-11-27 19:32:21,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3204186.6666666665, ans=0.125 2023-11-27 19:32:21,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3204186.6666666665, ans=0.0 2023-11-27 19:32:22,252 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 11700, loss[loss=0.06272, simple_loss=0.08507, pruned_loss=0.01013, audio_tagging_loss=0.01005, over 15784.00 frames. ], tot_loss[loss=0.06644, simple_loss=0.09027, pruned_loss=0.0125, audio_tagging_loss=0.008815, over 3043305.75 frames. ], batch size: 60, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:32:28,155 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.912e+01 8.724e+01 9.258e+01 1.003e+02 1.518e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-27 19:32:29,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3204186.6666666665, ans=0.1 2023-11-27 19:32:38,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3204253.3333333335, ans=0.05 2023-11-27 19:32:38,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3204253.3333333335, ans=0.0 2023-11-27 19:32:44,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3204320.0, ans=0.0 2023-11-27 19:32:45,833 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 480650 2023-11-27 19:32:51,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=3204320.0, ans=15.0 2023-11-27 19:33:00,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3204386.6666666665, ans=0.125 2023-11-27 19:33:20,256 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 11750, loss[loss=0.06892, simple_loss=0.08605, pruned_loss=0.01195, audio_tagging_loss=0.01395, over 15635.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.08991, pruned_loss=0.01254, audio_tagging_loss=0.008853, over 3040733.63 frames. ], batch size: 59, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:33:23,353 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3204520.0, ans=0.125 2023-11-27 19:33:28,800 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3204520.0, ans=0.0 2023-11-27 19:33:29,900 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3204520.0, ans=0.0 2023-11-27 19:33:29,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3204520.0, ans=0.125 2023-11-27 19:33:43,534 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 480700 2023-11-27 19:34:18,045 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 11800, loss[loss=0.07233, simple_loss=0.1069, pruned_loss=0.01249, audio_tagging_loss=0.006371, over 15480.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.08909, pruned_loss=0.0125, audio_tagging_loss=0.008892, over 3035353.86 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:34:23,474 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.513e+01 8.512e+01 9.140e+01 9.806e+01 1.375e+02, threshold=1.828e+02, percent-clipped=0.0 2023-11-27 19:34:36,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3204920.0, ans=0.0 2023-11-27 19:34:40,938 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 480750 2023-11-27 19:34:55,376 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3205053.3333333335, ans=0.125 2023-11-27 19:35:07,996 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.73 vs. limit=12.0 2023-11-27 19:35:15,456 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 11850, loss[loss=0.07514, simple_loss=0.1041, pruned_loss=0.01632, audio_tagging_loss=0.006757, over 15637.00 frames. ], tot_loss[loss=0.06624, simple_loss=0.08973, pruned_loss=0.01253, audio_tagging_loss=0.008844, over 3038934.95 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:35:27,745 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3205253.3333333335, ans=0.07 2023-11-27 19:35:39,086 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 480800 2023-11-27 19:35:46,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3205320.0, ans=0.1 2023-11-27 19:35:51,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3205386.6666666665, ans=0.0 2023-11-27 19:36:00,258 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.24 vs. limit=10.0 2023-11-27 19:36:13,849 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 11900, loss[loss=0.06532, simple_loss=0.09285, pruned_loss=0.009836, audio_tagging_loss=0.009059, over 15021.00 frames. ], tot_loss[loss=0.06692, simple_loss=0.09054, pruned_loss=0.01278, audio_tagging_loss=0.008874, over 3044522.35 frames. ], batch size: 59, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:36:18,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3205520.0, ans=0.125 2023-11-27 19:36:19,241 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.275e+01 8.961e+01 9.645e+01 1.029e+02 1.669e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-27 19:36:37,024 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 480850 2023-11-27 19:36:50,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3205720.0, ans=0.125 2023-11-27 19:36:54,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3205720.0, ans=0.125 2023-11-27 19:37:11,381 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 11950, loss[loss=0.07866, simple_loss=0.1081, pruned_loss=0.0154, audio_tagging_loss=0.009205, over 15013.00 frames. ], tot_loss[loss=0.06684, simple_loss=0.09033, pruned_loss=0.01272, audio_tagging_loss=0.008953, over 3047678.64 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:37:20,913 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3205853.3333333335, ans=0.2 2023-11-27 19:37:20,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3205853.3333333335, ans=0.5 2023-11-27 19:37:23,151 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3205920.0, ans=0.1 2023-11-27 19:37:34,404 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 480900 2023-11-27 19:37:58,897 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.08 vs. limit=10.0 2023-11-27 19:38:07,392 INFO [train_asr.py:1235] (1/4) Epoch 40, batch 12000, loss[loss=0.07545, simple_loss=0.09915, pruned_loss=0.0152, audio_tagging_loss=0.01068, over 15399.00 frames. ], tot_loss[loss=0.06746, simple_loss=0.09125, pruned_loss=0.01285, audio_tagging_loss=0.008979, over 3052529.40 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 32.0 2023-11-27 19:38:07,393 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-27 19:38:41,931 INFO [train_asr.py:1267] (1/4) Epoch 40, validation: loss=0.05781, simple_loss=0.05069, pruned_loss=0.005234, audio_tagging_loss=0.02723, over 4681554.00 frames. 2023-11-27 19:38:41,932 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-27 19:38:47,257 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.817e+01 8.885e+01 9.490e+01 1.034e+02 1.237e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-27 19:38:47,996 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.00 vs. limit=22.5 2023-11-27 19:39:00,872 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3206253.3333333335, ans=0.125 2023-11-27 19:39:00,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3206253.3333333335, ans=0.0 2023-11-27 19:39:02,876 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 480950 2023-11-27 19:39:05,960 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.93 vs. limit=22.5 2023-11-27 19:39:26,323 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 0, loss[loss=0.07847, simple_loss=0.0889, pruned_loss=0.009883, audio_tagging_loss=0.02414, over 14691.00 frames. ], tot_loss[loss=0.07847, simple_loss=0.0889, pruned_loss=0.009883, audio_tagging_loss=0.02414, over 14691.00 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 32.0 2023-11-27 19:39:26,324 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-27 19:40:00,214 INFO [train_asr.py:1267] (1/4) Epoch 41, validation: loss=0.05782, simple_loss=0.05064, pruned_loss=0.005197, audio_tagging_loss=0.0273, over 4681554.00 frames. 2023-11-27 19:40:00,215 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-27 19:40:00,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3206360.0, ans=0.0 2023-11-27 19:40:07,532 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.67 vs. limit=15.0 2023-11-27 19:40:22,097 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3206493.3333333335, ans=0.125 2023-11-27 19:40:43,173 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.42 vs. limit=10.0 2023-11-27 19:40:50,935 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 481000 2023-11-27 19:40:57,768 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 50, loss[loss=0.05208, simple_loss=0.05916, pruned_loss=0.00611, audio_tagging_loss=0.01639, over 14751.00 frames. ], tot_loss[loss=0.07653, simple_loss=0.09391, pruned_loss=0.01341, audio_tagging_loss=0.01616, over 691396.91 frames. ], batch size: 57, lr: 1.66e-03, grad_scale: 32.0 2023-11-27 19:41:14,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3206760.0, ans=0.125 2023-11-27 19:41:18,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3206760.0, ans=0.125 2023-11-27 19:41:27,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3206826.6666666665, ans=0.2 2023-11-27 19:41:29,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3206826.6666666665, ans=0.0 2023-11-27 19:41:31,553 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.740e+01 9.368e+01 1.003e+02 1.103e+02 1.548e+02, threshold=2.006e+02, percent-clipped=0.0 2023-11-27 19:41:34,468 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.14 vs. limit=15.0 2023-11-27 19:41:36,178 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3206893.3333333335, ans=0.125 2023-11-27 19:41:48,707 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 481050 2023-11-27 19:41:53,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3206960.0, ans=0.1 2023-11-27 19:41:55,743 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 100, loss[loss=0.08095, simple_loss=0.1009, pruned_loss=0.01708, audio_tagging_loss=0.01342, over 16563.00 frames. ], tot_loss[loss=0.07482, simple_loss=0.09176, pruned_loss=0.01306, audio_tagging_loss=0.01588, over 1214850.31 frames. ], batch size: 58, lr: 1.66e-03, grad_scale: 32.0 2023-11-27 19:42:00,576 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3207026.6666666665, ans=0.125 2023-11-27 19:42:01,850 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.19 vs. limit=10.0 2023-11-27 19:42:07,204 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.82 vs. limit=6.0 2023-11-27 19:42:07,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3207093.3333333335, ans=0.1 2023-11-27 19:42:13,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3207093.3333333335, ans=0.0 2023-11-27 19:42:19,742 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.78 vs. limit=6.0 2023-11-27 19:42:34,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3207226.6666666665, ans=0.0 2023-11-27 19:42:37,239 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.28 vs. limit=6.0 2023-11-27 19:42:37,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3207226.6666666665, ans=0.125 2023-11-27 19:42:46,577 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 481100 2023-11-27 19:42:53,756 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 150, loss[loss=0.08455, simple_loss=0.1128, pruned_loss=0.01939, audio_tagging_loss=0.008782, over 15504.00 frames. ], tot_loss[loss=0.07253, simple_loss=0.09087, pruned_loss=0.01277, audio_tagging_loss=0.01432, over 1614931.21 frames. ], batch size: 59, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 19:42:58,316 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3207360.0, ans=0.125 2023-11-27 19:43:04,411 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3207426.6666666665, ans=0.1 2023-11-27 19:43:11,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3207426.6666666665, ans=0.0 2023-11-27 19:43:12,241 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=3207426.6666666665, ans=0.05 2023-11-27 19:43:12,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3207426.6666666665, ans=0.125 2023-11-27 19:43:28,187 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.150e+01 9.036e+01 9.587e+01 1.014e+02 1.345e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-27 19:43:36,837 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3207560.0, ans=0.125 2023-11-27 19:43:44,396 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 481150 2023-11-27 19:43:51,436 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 200, loss[loss=0.07747, simple_loss=0.1065, pruned_loss=0.01593, audio_tagging_loss=0.008273, over 14278.00 frames. ], tot_loss[loss=0.07043, simple_loss=0.09004, pruned_loss=0.01254, audio_tagging_loss=0.01287, over 1919415.91 frames. ], batch size: 54, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 19:43:52,809 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3207693.3333333335, ans=0.1 2023-11-27 19:44:00,781 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=3207693.3333333335, ans=0.05 2023-11-27 19:44:42,430 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 481200 2023-11-27 19:44:42,818 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.19 vs. limit=12.0 2023-11-27 19:44:49,824 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 250, loss[loss=0.06929, simple_loss=0.08391, pruned_loss=0.01608, audio_tagging_loss=0.01125, over 15143.00 frames. ], tot_loss[loss=0.06985, simple_loss=0.09096, pruned_loss=0.01272, audio_tagging_loss=0.01165, over 2179413.06 frames. ], batch size: 55, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 19:45:09,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3208093.3333333335, ans=0.125 2023-11-27 19:45:23,717 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.580e+01 9.216e+01 9.866e+01 1.064e+02 1.717e+02, threshold=1.973e+02, percent-clipped=0.0 2023-11-27 19:45:40,421 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 481250 2023-11-27 19:45:41,598 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3208293.3333333335, ans=0.125 2023-11-27 19:45:47,469 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 300, loss[loss=0.0911, simple_loss=0.1236, pruned_loss=0.02222, audio_tagging_loss=0.007104, over 14623.00 frames. ], tot_loss[loss=0.06923, simple_loss=0.09119, pruned_loss=0.01279, audio_tagging_loss=0.01084, over 2374567.10 frames. ], batch size: 50, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 19:45:58,841 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.74 vs. limit=15.0 2023-11-27 19:46:14,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3208493.3333333335, ans=0.125 2023-11-27 19:46:14,296 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.06 vs. limit=10.0 2023-11-27 19:46:20,395 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.34 vs. limit=15.0 2023-11-27 19:46:23,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3208560.0, ans=0.0 2023-11-27 19:46:38,143 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 481300 2023-11-27 19:46:44,691 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 350, loss[loss=0.05662, simple_loss=0.06772, pruned_loss=0.01021, audio_tagging_loss=0.01255, over 15606.00 frames. ], tot_loss[loss=0.06857, simple_loss=0.09108, pruned_loss=0.01284, audio_tagging_loss=0.0102, over 2524915.50 frames. ], batch size: 59, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 19:46:47,144 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3208693.3333333335, ans=0.2 2023-11-27 19:46:55,708 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3208760.0, ans=0.125 2023-11-27 19:47:08,728 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.40 vs. limit=15.0 2023-11-27 19:47:12,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3208826.6666666665, ans=0.0 2023-11-27 19:47:16,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3208826.6666666665, ans=0.125 2023-11-27 19:47:17,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3208826.6666666665, ans=0.2 2023-11-27 19:47:19,472 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.679e+01 8.707e+01 9.326e+01 9.986e+01 1.163e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-27 19:47:20,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3208893.3333333335, ans=0.125 2023-11-27 19:47:35,527 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 481350 2023-11-27 19:47:40,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3208960.0, ans=0.125 2023-11-27 19:47:43,122 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 400, loss[loss=0.06246, simple_loss=0.08828, pruned_loss=0.009817, audio_tagging_loss=0.008506, over 15487.00 frames. ], tot_loss[loss=0.0673, simple_loss=0.08988, pruned_loss=0.01248, audio_tagging_loss=0.009876, over 2637891.56 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 32.0 2023-11-27 19:47:44,522 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3209026.6666666665, ans=0.125 2023-11-27 19:48:03,754 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3209093.3333333335, ans=0.0 2023-11-27 19:48:04,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3209160.0, ans=0.125 2023-11-27 19:48:08,567 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.55 vs. limit=12.0 2023-11-27 19:48:33,312 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 481400 2023-11-27 19:48:34,983 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.95 vs. limit=10.0 2023-11-27 19:48:40,637 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 450, loss[loss=0.03991, simple_loss=0.0479, pruned_loss=0.005265, audio_tagging_loss=0.0107, over 15375.00 frames. ], tot_loss[loss=0.0671, simple_loss=0.08988, pruned_loss=0.01251, audio_tagging_loss=0.009656, over 2722277.23 frames. ], batch size: 62, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 19:49:16,484 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.647e+01 8.566e+01 9.069e+01 9.742e+01 1.634e+02, threshold=1.814e+02, percent-clipped=0.0 2023-11-27 19:49:17,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3209560.0, ans=0.0 2023-11-27 19:49:19,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3209560.0, ans=0.2 2023-11-27 19:49:31,318 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 481450 2023-11-27 19:49:31,490 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 19:49:37,849 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 500, loss[loss=0.05879, simple_loss=0.08434, pruned_loss=0.009035, audio_tagging_loss=0.00758, over 14329.00 frames. ], tot_loss[loss=0.06676, simple_loss=0.08971, pruned_loss=0.01244, audio_tagging_loss=0.00947, over 2793968.67 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 19:49:44,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3209693.3333333335, ans=0.125 2023-11-27 19:49:58,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3209760.0, ans=0.0 2023-11-27 19:50:10,138 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.79 vs. limit=22.5 2023-11-27 19:50:10,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3209826.6666666665, ans=0.0 2023-11-27 19:50:13,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3209893.3333333335, ans=0.04949747468305833 2023-11-27 19:50:25,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=3209960.0, ans=0.05 2023-11-27 19:50:28,334 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 481500 2023-11-27 19:50:36,107 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 550, loss[loss=0.07274, simple_loss=0.1018, pruned_loss=0.01372, audio_tagging_loss=0.008113, over 15551.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.08978, pruned_loss=0.01247, audio_tagging_loss=0.009281, over 2853518.93 frames. ], batch size: 58, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 19:51:01,914 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3210160.0, ans=0.125 2023-11-27 19:51:11,649 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.210e+01 8.776e+01 9.400e+01 1.030e+02 1.375e+02, threshold=1.880e+02, percent-clipped=0.0 2023-11-27 19:51:14,408 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.16 vs. limit=10.0 2023-11-27 19:51:19,270 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.14 vs. limit=15.0 2023-11-27 19:51:27,029 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 481550 2023-11-27 19:51:33,516 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 600, loss[loss=0.06797, simple_loss=0.09421, pruned_loss=0.01119, audio_tagging_loss=0.009681, over 15439.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.089, pruned_loss=0.01233, audio_tagging_loss=0.009335, over 2897341.33 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 19:51:39,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3210360.0, ans=0.125 2023-11-27 19:51:46,671 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3210426.6666666665, ans=0.125 2023-11-27 19:52:00,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3210493.3333333335, ans=0.0 2023-11-27 19:52:02,598 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3210493.3333333335, ans=0.0 2023-11-27 19:52:25,180 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 481600 2023-11-27 19:52:32,055 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 650, loss[loss=0.07946, simple_loss=0.1044, pruned_loss=0.01583, audio_tagging_loss=0.01141, over 15875.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.08885, pruned_loss=0.01231, audio_tagging_loss=0.009323, over 2939614.45 frames. ], batch size: 60, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 19:53:09,341 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.635e+01 8.855e+01 9.490e+01 1.019e+02 1.294e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-27 19:53:22,851 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 481650 2023-11-27 19:53:22,998 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3210960.0, ans=0.125 2023-11-27 19:53:29,907 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 700, loss[loss=0.04966, simple_loss=0.05817, pruned_loss=0.009413, audio_tagging_loss=0.01116, over 15280.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.08906, pruned_loss=0.01233, audio_tagging_loss=0.009177, over 2959995.28 frames. ], batch size: 58, lr: 1.66e-03, grad_scale: 8.0 2023-11-27 19:53:31,348 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3211026.6666666665, ans=0.1 2023-11-27 19:53:31,664 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.30 vs. limit=15.0 2023-11-27 19:53:33,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3211026.6666666665, ans=0.125 2023-11-27 19:53:35,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3211026.6666666665, ans=0.0 2023-11-27 19:53:37,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3211026.6666666665, ans=0.125 2023-11-27 19:53:49,324 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3211093.3333333335, ans=0.125 2023-11-27 19:53:52,934 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.87 vs. limit=6.0 2023-11-27 19:53:52,999 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.64 vs. limit=12.0 2023-11-27 19:54:01,422 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3211160.0, ans=0.2 2023-11-27 19:54:07,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3211226.6666666665, ans=0.0 2023-11-27 19:54:13,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3211226.6666666665, ans=0.0 2023-11-27 19:54:20,588 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 481700 2023-11-27 19:54:26,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3211360.0, ans=0.125 2023-11-27 19:54:27,681 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 750, loss[loss=0.06101, simple_loss=0.08029, pruned_loss=0.009281, audio_tagging_loss=0.01158, over 15616.00 frames. ], tot_loss[loss=0.06705, simple_loss=0.09049, pruned_loss=0.01275, audio_tagging_loss=0.009049, over 2978417.25 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 8.0 2023-11-27 19:54:49,626 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.39 vs. limit=15.0 2023-11-27 19:55:04,491 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.339e+01 8.938e+01 9.552e+01 1.040e+02 1.357e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-27 19:55:04,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3211560.0, ans=0.015 2023-11-27 19:55:04,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3211560.0, ans=0.125 2023-11-27 19:55:19,177 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 481750 2023-11-27 19:55:19,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3211626.6666666665, ans=0.125 2023-11-27 19:55:19,727 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.97 vs. limit=15.0 2023-11-27 19:55:25,748 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 800, loss[loss=0.05489, simple_loss=0.07091, pruned_loss=0.009565, audio_tagging_loss=0.009869, over 15410.00 frames. ], tot_loss[loss=0.06704, simple_loss=0.09043, pruned_loss=0.01274, audio_tagging_loss=0.009084, over 2994800.62 frames. ], batch size: 58, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 19:55:42,192 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.69 vs. limit=12.0 2023-11-27 19:56:13,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3211960.0, ans=0.0 2023-11-27 19:56:16,542 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 481800 2023-11-27 19:56:23,310 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 850, loss[loss=0.06677, simple_loss=0.08782, pruned_loss=0.01315, audio_tagging_loss=0.009711, over 15595.00 frames. ], tot_loss[loss=0.06662, simple_loss=0.08966, pruned_loss=0.0126, audio_tagging_loss=0.009188, over 3005309.74 frames. ], batch size: 58, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 19:56:24,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3212026.6666666665, ans=0.125 2023-11-27 19:56:29,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3212026.6666666665, ans=0.125 2023-11-27 19:56:32,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3212026.6666666665, ans=0.125 2023-11-27 19:56:36,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3212093.3333333335, ans=0.0 2023-11-27 19:56:45,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3212093.3333333335, ans=0.125 2023-11-27 19:56:58,259 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.48 vs. limit=15.0 2023-11-27 19:56:58,752 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3212226.6666666665, ans=0.2 2023-11-27 19:57:00,642 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.378e+01 8.782e+01 9.230e+01 1.009e+02 1.508e+02, threshold=1.846e+02, percent-clipped=0.0 2023-11-27 19:57:00,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3212226.6666666665, ans=0.125 2023-11-27 19:57:09,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3212293.3333333335, ans=0.2 2023-11-27 19:57:12,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3212293.3333333335, ans=0.125 2023-11-27 19:57:12,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3212293.3333333335, ans=0.125 2023-11-27 19:57:14,859 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 481850 2023-11-27 19:57:21,337 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 900, loss[loss=0.06604, simple_loss=0.08454, pruned_loss=0.01291, audio_tagging_loss=0.01086, over 15632.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.08901, pruned_loss=0.01257, audio_tagging_loss=0.009307, over 3014694.19 frames. ], batch size: 58, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 19:57:44,031 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.23 vs. limit=15.0 2023-11-27 19:57:50,383 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3212493.3333333335, ans=0.0 2023-11-27 19:57:57,204 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3212560.0, ans=0.125 2023-11-27 19:58:03,186 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3212560.0, ans=0.0 2023-11-27 19:58:05,485 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 19:58:09,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3212626.6666666665, ans=0.0 2023-11-27 19:58:12,827 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 481900 2023-11-27 19:58:14,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3212626.6666666665, ans=0.125 2023-11-27 19:58:17,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3212626.6666666665, ans=0.125 2023-11-27 19:58:19,366 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 950, loss[loss=0.06639, simple_loss=0.09007, pruned_loss=0.01256, audio_tagging_loss=0.008788, over 15553.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.08942, pruned_loss=0.01274, audio_tagging_loss=0.009156, over 3021589.59 frames. ], batch size: 57, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 19:58:33,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3212760.0, ans=0.0 2023-11-27 19:58:56,314 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.764e+01 9.001e+01 9.830e+01 1.057e+02 1.367e+02, threshold=1.966e+02, percent-clipped=0.0 2023-11-27 19:59:10,369 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 481950 2023-11-27 19:59:13,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3212960.0, ans=0.1 2023-11-27 19:59:15,269 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.30 vs. limit=15.0 2023-11-27 19:59:16,915 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 1000, loss[loss=0.05727, simple_loss=0.07709, pruned_loss=0.009405, audio_tagging_loss=0.009318, over 14958.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.0893, pruned_loss=0.01264, audio_tagging_loss=0.008931, over 3020805.79 frames. ], batch size: 57, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 19:59:40,489 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.12 vs. limit=8.0 2023-11-27 19:59:44,568 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 19:59:51,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3213226.6666666665, ans=0.0 2023-11-27 20:00:07,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3213293.3333333335, ans=0.0 2023-11-27 20:00:08,218 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 482000 2023-11-27 20:00:15,088 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 1050, loss[loss=0.06129, simple_loss=0.07896, pruned_loss=0.01057, audio_tagging_loss=0.01124, over 15562.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.08806, pruned_loss=0.01234, audio_tagging_loss=0.008835, over 3025827.42 frames. ], batch size: 60, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 20:00:32,599 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.52 vs. limit=15.0 2023-11-27 20:00:33,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3213426.6666666665, ans=0.125 2023-11-27 20:00:33,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3213426.6666666665, ans=0.0 2023-11-27 20:00:51,873 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.321e+01 8.496e+01 9.150e+01 1.002e+02 1.300e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-27 20:00:52,459 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=5.95 vs. limit=15.0 2023-11-27 20:01:03,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3213626.6666666665, ans=0.0 2023-11-27 20:01:05,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3213626.6666666665, ans=0.125 2023-11-27 20:01:05,896 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 482050 2023-11-27 20:01:13,677 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 1100, loss[loss=0.08512, simple_loss=0.1188, pruned_loss=0.01872, audio_tagging_loss=0.006981, over 15390.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.08825, pruned_loss=0.01237, audio_tagging_loss=0.008716, over 3027921.53 frames. ], batch size: 55, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 20:01:18,153 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 20:01:28,681 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.37 vs. limit=15.0 2023-11-27 20:01:31,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3213760.0, ans=0.09899494936611666 2023-11-27 20:01:38,969 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3213826.6666666665, ans=0.125 2023-11-27 20:01:50,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3213893.3333333335, ans=0.0 2023-11-27 20:02:04,203 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 482100 2023-11-27 20:02:10,950 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 1150, loss[loss=0.06411, simple_loss=0.09062, pruned_loss=0.009575, audio_tagging_loss=0.009229, over 13914.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08873, pruned_loss=0.01249, audio_tagging_loss=0.008738, over 3025945.10 frames. ], batch size: 53, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 20:02:12,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3214026.6666666665, ans=0.125 2023-11-27 20:02:19,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3214026.6666666665, ans=0.1 2023-11-27 20:02:35,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3214160.0, ans=0.0 2023-11-27 20:02:36,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3214160.0, ans=0.125 2023-11-27 20:02:36,719 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.92 vs. limit=15.0 2023-11-27 20:02:48,775 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.691e+01 8.534e+01 9.220e+01 9.959e+01 1.599e+02, threshold=1.844e+02, percent-clipped=0.0 2023-11-27 20:02:52,282 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3214226.6666666665, ans=0.1 2023-11-27 20:02:54,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3214226.6666666665, ans=0.05 2023-11-27 20:03:02,497 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 482150 2023-11-27 20:03:09,524 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 1200, loss[loss=0.05872, simple_loss=0.0764, pruned_loss=0.009165, audio_tagging_loss=0.01136, over 14879.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.089, pruned_loss=0.01252, audio_tagging_loss=0.008737, over 3027275.13 frames. ], batch size: 57, lr: 1.66e-03, grad_scale: 32.0 2023-11-27 20:03:11,241 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.15 vs. limit=15.0 2023-11-27 20:03:36,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3214493.3333333335, ans=0.125 2023-11-27 20:03:43,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3214560.0, ans=0.125 2023-11-27 20:03:49,388 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.07 vs. limit=6.0 2023-11-27 20:04:00,220 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 482200 2023-11-27 20:04:06,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3214693.3333333335, ans=0.125 2023-11-27 20:04:07,634 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 1250, loss[loss=0.09136, simple_loss=0.1289, pruned_loss=0.0199, audio_tagging_loss=0.007024, over 14721.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08902, pruned_loss=0.01232, audio_tagging_loss=0.008804, over 3032838.21 frames. ], batch size: 53, lr: 1.66e-03, grad_scale: 32.0 2023-11-27 20:04:08,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3214693.3333333335, ans=0.1 2023-11-27 20:04:44,178 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.423e+01 8.597e+01 9.538e+01 1.019e+02 1.522e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-27 20:04:58,114 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 482250 2023-11-27 20:05:05,199 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 1300, loss[loss=0.05582, simple_loss=0.08032, pruned_loss=0.009856, audio_tagging_loss=0.005799, over 14783.00 frames. ], tot_loss[loss=0.06506, simple_loss=0.0886, pruned_loss=0.01207, audio_tagging_loss=0.008699, over 3032296.60 frames. ], batch size: 57, lr: 1.66e-03, grad_scale: 32.0 2023-11-27 20:05:06,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3215026.6666666665, ans=0.125 2023-11-27 20:05:29,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3215160.0, ans=0.125 2023-11-27 20:05:33,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3215160.0, ans=0.125 2023-11-27 20:05:42,465 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.53 vs. limit=15.0 2023-11-27 20:05:44,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3215226.6666666665, ans=0.2 2023-11-27 20:05:44,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3215226.6666666665, ans=0.2 2023-11-27 20:05:48,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3215226.6666666665, ans=0.0 2023-11-27 20:05:55,750 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 482300 2023-11-27 20:05:59,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3215293.3333333335, ans=0.0 2023-11-27 20:06:03,057 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 1350, loss[loss=0.05844, simple_loss=0.07874, pruned_loss=0.01039, audio_tagging_loss=0.008679, over 14595.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08899, pruned_loss=0.01227, audio_tagging_loss=0.008697, over 3035326.73 frames. ], batch size: 58, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 20:06:14,161 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3215426.6666666665, ans=0.125 2023-11-27 20:06:25,765 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3215493.3333333335, ans=0.1 2023-11-27 20:06:40,514 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.255e+01 8.606e+01 9.244e+01 9.716e+01 1.166e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-27 20:06:44,898 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.76 vs. limit=15.0 2023-11-27 20:06:47,036 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 20:06:53,644 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 482350 2023-11-27 20:06:55,950 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3215626.6666666665, ans=0.0 2023-11-27 20:06:58,110 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3215626.6666666665, ans=0.1 2023-11-27 20:07:00,782 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 1400, loss[loss=0.07035, simple_loss=0.09543, pruned_loss=0.01153, audio_tagging_loss=0.0111, over 15216.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.08871, pruned_loss=0.0121, audio_tagging_loss=0.008727, over 3044967.29 frames. ], batch size: 57, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 20:07:16,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3215760.0, ans=0.05 2023-11-27 20:07:43,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3215893.3333333335, ans=0.125 2023-11-27 20:07:46,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3215960.0, ans=0.125 2023-11-27 20:07:47,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3215960.0, ans=0.2 2023-11-27 20:07:51,653 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 482400 2023-11-27 20:07:54,263 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3215960.0, ans=0.0 2023-11-27 20:07:58,324 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 1450, loss[loss=0.0904, simple_loss=0.1265, pruned_loss=0.01753, audio_tagging_loss=0.009614, over 16123.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.08912, pruned_loss=0.01227, audio_tagging_loss=0.008785, over 3051465.01 frames. ], batch size: 58, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 20:08:26,913 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3216160.0, ans=0.0 2023-11-27 20:08:36,397 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.289e+01 8.746e+01 9.275e+01 1.017e+02 1.401e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-27 20:08:42,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3216226.6666666665, ans=0.1 2023-11-27 20:08:45,566 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3216293.3333333335, ans=0.0 2023-11-27 20:08:49,291 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 482450 2023-11-27 20:08:56,208 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 1500, loss[loss=0.07249, simple_loss=0.09953, pruned_loss=0.01195, audio_tagging_loss=0.01078, over 15075.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.08898, pruned_loss=0.01229, audio_tagging_loss=0.008935, over 3048115.13 frames. ], batch size: 55, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 20:09:16,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3216426.6666666665, ans=0.5 2023-11-27 20:09:16,257 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.14 vs. limit=22.5 2023-11-27 20:09:27,919 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3216493.3333333335, ans=0.125 2023-11-27 20:09:27,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3216493.3333333335, ans=0.125 2023-11-27 20:09:28,338 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.96 vs. limit=15.0 2023-11-27 20:09:29,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3216560.0, ans=0.0 2023-11-27 20:09:46,365 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3216626.6666666665, ans=0.125 2023-11-27 20:09:47,308 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 482500 2023-11-27 20:09:53,910 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 1550, loss[loss=0.07284, simple_loss=0.1068, pruned_loss=0.012, audio_tagging_loss=0.007417, over 15369.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.08925, pruned_loss=0.01231, audio_tagging_loss=0.00905, over 3053302.67 frames. ], batch size: 57, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 20:09:59,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3216693.3333333335, ans=0.125 2023-11-27 20:10:06,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3216760.0, ans=0.125 2023-11-27 20:10:11,594 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3216760.0, ans=0.0 2023-11-27 20:10:15,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3216826.6666666665, ans=0.5 2023-11-27 20:10:26,541 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.67 vs. limit=12.0 2023-11-27 20:10:27,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3216826.6666666665, ans=0.125 2023-11-27 20:10:32,571 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.717e+01 8.858e+01 9.389e+01 9.907e+01 1.182e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-27 20:10:37,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3216893.3333333335, ans=0.125 2023-11-27 20:10:42,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3216960.0, ans=0.125 2023-11-27 20:10:45,467 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 482550 2023-11-27 20:10:51,985 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 1600, loss[loss=0.07268, simple_loss=0.09165, pruned_loss=0.01559, audio_tagging_loss=0.01127, over 14089.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.08923, pruned_loss=0.01227, audio_tagging_loss=0.009148, over 3053094.48 frames. ], batch size: 54, lr: 1.66e-03, grad_scale: 32.0 2023-11-27 20:11:04,866 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3217093.3333333335, ans=0.015 2023-11-27 20:11:11,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3217093.3333333335, ans=0.0 2023-11-27 20:11:19,434 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.03 vs. limit=15.0 2023-11-27 20:11:28,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3217226.6666666665, ans=0.0 2023-11-27 20:11:42,441 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 482600 2023-11-27 20:11:49,951 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 1650, loss[loss=0.04672, simple_loss=0.0538, pruned_loss=0.007235, audio_tagging_loss=0.01258, over 15768.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.08895, pruned_loss=0.01235, audio_tagging_loss=0.00917, over 3056781.68 frames. ], batch size: 64, lr: 1.66e-03, grad_scale: 32.0 2023-11-27 20:11:54,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3217360.0, ans=0.07 2023-11-27 20:12:12,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3217493.3333333335, ans=0.0 2023-11-27 20:12:27,443 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.418e+01 8.730e+01 9.445e+01 1.002e+02 1.391e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-27 20:12:39,268 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.62 vs. limit=15.0 2023-11-27 20:12:40,728 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 482650 2023-11-27 20:12:47,280 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 1700, loss[loss=0.08628, simple_loss=0.1204, pruned_loss=0.01578, audio_tagging_loss=0.0103, over 15288.00 frames. ], tot_loss[loss=0.06645, simple_loss=0.08973, pruned_loss=0.01248, audio_tagging_loss=0.009104, over 3054771.61 frames. ], batch size: 57, lr: 1.66e-03, grad_scale: 32.0 2023-11-27 20:12:47,488 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3217693.3333333335, ans=0.1 2023-11-27 20:13:03,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3217760.0, ans=0.2 2023-11-27 20:13:07,019 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.34 vs. limit=15.0 2023-11-27 20:13:25,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3217893.3333333335, ans=0.125 2023-11-27 20:13:31,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3217893.3333333335, ans=0.2 2023-11-27 20:13:35,521 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3217960.0, ans=0.2 2023-11-27 20:13:38,652 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 482700 2023-11-27 20:13:42,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3217960.0, ans=0.2 2023-11-27 20:13:45,087 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 1750, loss[loss=0.05759, simple_loss=0.07875, pruned_loss=0.01082, audio_tagging_loss=0.007397, over 14974.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.08935, pruned_loss=0.01243, audio_tagging_loss=0.009008, over 3048868.84 frames. ], batch size: 58, lr: 1.66e-03, grad_scale: 32.0 2023-11-27 20:13:46,476 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3218026.6666666665, ans=0.125 2023-11-27 20:13:51,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3218026.6666666665, ans=0.0 2023-11-27 20:13:54,363 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.87 vs. limit=15.0 2023-11-27 20:14:09,160 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.40 vs. limit=22.5 2023-11-27 20:14:13,556 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.60 vs. limit=15.0 2023-11-27 20:14:23,552 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.432e+01 8.743e+01 9.232e+01 9.959e+01 1.189e+02, threshold=1.846e+02, percent-clipped=0.0 2023-11-27 20:14:25,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3218226.6666666665, ans=0.2 2023-11-27 20:14:28,497 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.43 vs. limit=15.0 2023-11-27 20:14:35,693 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 482750 2023-11-27 20:14:35,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3218293.3333333335, ans=0.125 2023-11-27 20:14:38,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3218293.3333333335, ans=0.0 2023-11-27 20:14:42,296 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 1800, loss[loss=0.05324, simple_loss=0.07596, pruned_loss=0.008154, audio_tagging_loss=0.007103, over 14443.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.08974, pruned_loss=0.01246, audio_tagging_loss=0.008848, over 3047139.71 frames. ], batch size: 55, lr: 1.66e-03, grad_scale: 32.0 2023-11-27 20:14:44,426 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.77 vs. limit=15.0 2023-11-27 20:15:01,915 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3218426.6666666665, ans=0.04949747468305833 2023-11-27 20:15:17,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3218560.0, ans=0.125 2023-11-27 20:15:33,393 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 482800 2023-11-27 20:15:40,755 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 1850, loss[loss=0.07584, simple_loss=0.1075, pruned_loss=0.01474, audio_tagging_loss=0.007334, over 14921.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.09052, pruned_loss=0.01253, audio_tagging_loss=0.008707, over 3043758.99 frames. ], batch size: 53, lr: 1.66e-03, grad_scale: 32.0 2023-11-27 20:15:41,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3218693.3333333335, ans=0.1 2023-11-27 20:15:41,461 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=14.54 vs. limit=15.0 2023-11-27 20:15:56,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3218760.0, ans=0.0 2023-11-27 20:15:58,344 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.71 vs. limit=22.5 2023-11-27 20:16:04,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3218826.6666666665, ans=0.125 2023-11-27 20:16:13,163 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.71 vs. limit=15.0 2023-11-27 20:16:13,958 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3218893.3333333335, ans=0.125 2023-11-27 20:16:18,491 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.212e+01 8.712e+01 9.397e+01 9.825e+01 1.168e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-27 20:16:31,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3218960.0, ans=0.0 2023-11-27 20:16:31,983 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 482850 2023-11-27 20:16:38,530 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 1900, loss[loss=0.06659, simple_loss=0.08311, pruned_loss=0.01543, audio_tagging_loss=0.009606, over 16325.00 frames. ], tot_loss[loss=0.06589, simple_loss=0.0901, pruned_loss=0.01223, audio_tagging_loss=0.008615, over 3056989.53 frames. ], batch size: 63, lr: 1.66e-03, grad_scale: 32.0 2023-11-27 20:17:29,286 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 482900 2023-11-27 20:17:35,837 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 1950, loss[loss=0.0615, simple_loss=0.07868, pruned_loss=0.01082, audio_tagging_loss=0.01134, over 14895.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.09036, pruned_loss=0.01238, audio_tagging_loss=0.008616, over 3051966.53 frames. ], batch size: 55, lr: 1.66e-03, grad_scale: 32.0 2023-11-27 20:17:38,298 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 20:17:45,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3219360.0, ans=0.0 2023-11-27 20:18:00,488 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3219493.3333333335, ans=0.125 2023-11-27 20:18:08,090 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 20:18:15,538 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.488e+01 8.669e+01 9.288e+01 9.966e+01 1.212e+02, threshold=1.858e+02, percent-clipped=0.0 2023-11-27 20:18:16,830 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3219560.0, ans=0.125 2023-11-27 20:18:27,137 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 482950 2023-11-27 20:18:34,222 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 2000, loss[loss=0.07836, simple_loss=0.1122, pruned_loss=0.01579, audio_tagging_loss=0.006466, over 14533.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.0897, pruned_loss=0.01259, audio_tagging_loss=0.008657, over 3038301.20 frames. ], batch size: 54, lr: 1.66e-03, grad_scale: 32.0 2023-11-27 20:18:38,504 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3219693.3333333335, ans=0.0 2023-11-27 20:18:38,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3219693.3333333335, ans=0.125 2023-11-27 20:18:44,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3219693.3333333335, ans=0.1 2023-11-27 20:19:00,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3219826.6666666665, ans=0.0 2023-11-27 20:19:08,097 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3219893.3333333335, ans=0.0 2023-11-27 20:19:15,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3219893.3333333335, ans=0.0 2023-11-27 20:19:25,586 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 483000 2023-11-27 20:19:32,547 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 2050, loss[loss=0.07462, simple_loss=0.1146, pruned_loss=0.0102, audio_tagging_loss=0.007109, over 15321.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.09013, pruned_loss=0.01261, audio_tagging_loss=0.008574, over 3042867.93 frames. ], batch size: 55, lr: 1.66e-03, grad_scale: 32.0 2023-11-27 20:19:47,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3220093.3333333335, ans=0.1 2023-11-27 20:19:47,708 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.35 vs. limit=12.0 2023-11-27 20:20:04,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3220160.0, ans=0.0 2023-11-27 20:20:12,034 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.951e+01 8.893e+01 9.583e+01 1.011e+02 1.256e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-27 20:20:17,592 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3220293.3333333335, ans=0.125 2023-11-27 20:20:23,001 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 483050 2023-11-27 20:20:29,583 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 2100, loss[loss=0.05372, simple_loss=0.06493, pruned_loss=0.01094, audio_tagging_loss=0.01031, over 13555.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.0894, pruned_loss=0.0125, audio_tagging_loss=0.008627, over 3043387.72 frames. ], batch size: 53, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 20:20:32,573 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.38 vs. limit=15.0 2023-11-27 20:21:00,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3220493.3333333335, ans=0.125 2023-11-27 20:21:12,018 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.02 vs. limit=15.0 2023-11-27 20:21:17,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3220626.6666666665, ans=0.125 2023-11-27 20:21:20,573 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 483100 2023-11-27 20:21:21,907 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3220626.6666666665, ans=0.125 2023-11-27 20:21:27,433 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 2150, loss[loss=0.05054, simple_loss=0.06661, pruned_loss=0.00801, audio_tagging_loss=0.00922, over 16499.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.08919, pruned_loss=0.01248, audio_tagging_loss=0.008626, over 3042852.20 frames. ], batch size: 65, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 20:21:54,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3220826.6666666665, ans=10.0 2023-11-27 20:21:54,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3220826.6666666665, ans=0.0 2023-11-27 20:22:03,883 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 20:22:07,612 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.323e+01 8.704e+01 9.254e+01 9.792e+01 1.378e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-27 20:22:07,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3220893.3333333335, ans=0.0 2023-11-27 20:22:17,592 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 483150 2023-11-27 20:22:25,333 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 2200, loss[loss=0.04767, simple_loss=0.06803, pruned_loss=0.006111, audio_tagging_loss=0.00755, over 15141.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.08956, pruned_loss=0.01252, audio_tagging_loss=0.008693, over 3043974.93 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 20:22:42,215 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-27 20:22:45,566 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 20:22:47,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3221160.0, ans=0.0 2023-11-27 20:22:57,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_na.min_abs, batch_count=3221160.0, ans=0.02 2023-11-27 20:23:16,036 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 483200 2023-11-27 20:23:23,024 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 2250, loss[loss=0.07617, simple_loss=0.1029, pruned_loss=0.01682, audio_tagging_loss=0.007914, over 15137.00 frames. ], tot_loss[loss=0.06663, simple_loss=0.09047, pruned_loss=0.01269, audio_tagging_loss=0.008703, over 3042988.50 frames. ], batch size: 55, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 20:23:53,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3221493.3333333335, ans=0.125 2023-11-27 20:24:03,954 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.981e+01 8.930e+01 9.422e+01 1.015e+02 1.618e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-27 20:24:04,577 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.32 vs. limit=15.0 2023-11-27 20:24:08,872 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.39 vs. limit=15.0 2023-11-27 20:24:14,062 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 483250 2023-11-27 20:24:16,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3221626.6666666665, ans=0.5 2023-11-27 20:24:20,521 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3221693.3333333335, ans=0.0 2023-11-27 20:24:21,365 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 2300, loss[loss=0.07461, simple_loss=0.09699, pruned_loss=0.01589, audio_tagging_loss=0.01022, over 14781.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.09018, pruned_loss=0.01273, audio_tagging_loss=0.008729, over 3045689.14 frames. ], batch size: 54, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 20:24:27,346 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.44 vs. limit=22.5 2023-11-27 20:25:11,984 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 483300 2023-11-27 20:25:14,177 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 20:25:14,365 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3221960.0, ans=0.125 2023-11-27 20:25:19,106 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 2350, loss[loss=0.06442, simple_loss=0.09001, pruned_loss=0.01137, audio_tagging_loss=0.00805, over 15161.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.08965, pruned_loss=0.01258, audio_tagging_loss=0.00886, over 3042992.11 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:25:37,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3222093.3333333335, ans=0.07 2023-11-27 20:25:52,639 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 20:25:58,847 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.350e+01 8.811e+01 9.279e+01 1.007e+02 1.436e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-27 20:26:09,509 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 483350 2023-11-27 20:26:16,847 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 2400, loss[loss=0.07958, simple_loss=0.1108, pruned_loss=0.01621, audio_tagging_loss=0.007983, over 16049.00 frames. ], tot_loss[loss=0.06681, simple_loss=0.09041, pruned_loss=0.01266, audio_tagging_loss=0.008954, over 3046669.63 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 20:26:18,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3222360.0, ans=0.125 2023-11-27 20:26:20,382 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3222360.0, ans=0.0 2023-11-27 20:26:37,521 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.07 vs. limit=15.0 2023-11-27 20:26:39,358 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3222493.3333333335, ans=0.125 2023-11-27 20:26:51,085 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3222560.0, ans=0.1 2023-11-27 20:27:04,747 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 20:27:07,846 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 483400 2023-11-27 20:27:07,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3222626.6666666665, ans=0.125 2023-11-27 20:27:15,127 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 2450, loss[loss=0.04953, simple_loss=0.0627, pruned_loss=0.007839, audio_tagging_loss=0.01034, over 15639.00 frames. ], tot_loss[loss=0.06649, simple_loss=0.08981, pruned_loss=0.01254, audio_tagging_loss=0.009042, over 3049804.12 frames. ], batch size: 60, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 20:27:23,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3222693.3333333335, ans=0.0 2023-11-27 20:27:27,097 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.04 vs. limit=15.0 2023-11-27 20:27:35,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3222760.0, ans=0.2 2023-11-27 20:27:48,921 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3222893.3333333335, ans=0.0 2023-11-27 20:27:56,802 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.146e+01 8.599e+01 9.201e+01 9.948e+01 1.437e+02, threshold=1.840e+02, percent-clipped=0.0 2023-11-27 20:28:05,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3222960.0, ans=0.0 2023-11-27 20:28:06,192 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 483450 2023-11-27 20:28:12,697 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 2500, loss[loss=0.05788, simple_loss=0.0765, pruned_loss=0.01079, audio_tagging_loss=0.00884, over 14335.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.08934, pruned_loss=0.01245, audio_tagging_loss=0.009032, over 3046677.96 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:28:13,185 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.48 vs. limit=22.5 2023-11-27 20:28:15,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3223026.6666666665, ans=0.125 2023-11-27 20:28:16,289 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.26 vs. limit=22.5 2023-11-27 20:28:52,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3223226.6666666665, ans=0.1 2023-11-27 20:29:00,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3223293.3333333335, ans=0.125 2023-11-27 20:29:04,291 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 483500 2023-11-27 20:29:10,719 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 2550, loss[loss=0.07722, simple_loss=0.1156, pruned_loss=0.01452, audio_tagging_loss=0.004878, over 15913.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.08914, pruned_loss=0.01245, audio_tagging_loss=0.008976, over 3045562.91 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:29:33,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3223493.3333333335, ans=0.125 2023-11-27 20:29:51,505 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3223560.0, ans=0.125 2023-11-27 20:29:52,375 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.239e+01 8.657e+01 9.326e+01 1.025e+02 1.204e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-27 20:29:58,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3223626.6666666665, ans=0.1 2023-11-27 20:29:59,890 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3223626.6666666665, ans=0.0 2023-11-27 20:30:01,949 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 483550 2023-11-27 20:30:02,624 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.77 vs. limit=6.0 2023-11-27 20:30:08,859 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 2600, loss[loss=0.05486, simple_loss=0.06685, pruned_loss=0.01004, audio_tagging_loss=0.01139, over 14233.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08843, pruned_loss=0.01223, audio_tagging_loss=0.008935, over 3044637.67 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:30:20,401 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3223760.0, ans=0.0 2023-11-27 20:30:22,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3223760.0, ans=0.0 2023-11-27 20:30:59,564 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 483600 2023-11-27 20:31:00,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3223960.0, ans=0.2 2023-11-27 20:31:06,374 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 2650, loss[loss=0.04758, simple_loss=0.05572, pruned_loss=0.008702, audio_tagging_loss=0.01102, over 13724.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.0883, pruned_loss=0.01226, audio_tagging_loss=0.008803, over 3043519.48 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:31:15,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3224026.6666666665, ans=0.125 2023-11-27 20:31:21,712 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 20:31:36,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3224160.0, ans=0.125 2023-11-27 20:31:48,425 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.355e+01 8.676e+01 9.510e+01 9.992e+01 1.225e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-27 20:31:57,872 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 483650 2023-11-27 20:31:59,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3224293.3333333335, ans=0.0 2023-11-27 20:32:04,332 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 2700, loss[loss=0.07219, simple_loss=0.09954, pruned_loss=0.01374, audio_tagging_loss=0.008678, over 15657.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.08924, pruned_loss=0.0125, audio_tagging_loss=0.008782, over 3041252.68 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:32:09,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3224360.0, ans=0.125 2023-11-27 20:32:11,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3224360.0, ans=0.125 2023-11-27 20:32:16,572 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.85 vs. limit=12.0 2023-11-27 20:32:41,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3224560.0, ans=0.0 2023-11-27 20:32:46,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3224560.0, ans=0.0 2023-11-27 20:32:46,543 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3224560.0, ans=0.07 2023-11-27 20:32:53,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3224626.6666666665, ans=0.125 2023-11-27 20:32:55,163 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 483700 2023-11-27 20:33:02,273 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 2750, loss[loss=0.06239, simple_loss=0.0822, pruned_loss=0.01244, audio_tagging_loss=0.008847, over 15292.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.08889, pruned_loss=0.01238, audio_tagging_loss=0.008823, over 3039078.38 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:33:06,630 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.76 vs. limit=15.0 2023-11-27 20:33:34,007 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3224826.6666666665, ans=0.1 2023-11-27 20:33:43,557 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.983e+01 8.556e+01 9.189e+01 9.890e+01 1.172e+02, threshold=1.838e+02, percent-clipped=0.0 2023-11-27 20:33:45,188 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.65 vs. limit=6.0 2023-11-27 20:33:46,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3224893.3333333335, ans=0.125 2023-11-27 20:33:49,162 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 20:33:51,687 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.06 vs. limit=10.0 2023-11-27 20:33:53,804 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 20:33:53,837 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 483750 2023-11-27 20:34:00,311 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 2800, loss[loss=0.06422, simple_loss=0.1021, pruned_loss=0.006209, audio_tagging_loss=0.006939, over 15885.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.08815, pruned_loss=0.01219, audio_tagging_loss=0.008867, over 3035539.63 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 20:34:12,073 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3225093.3333333335, ans=0.0 2023-11-27 20:34:15,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3225093.3333333335, ans=0.125 2023-11-27 20:34:18,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3225093.3333333335, ans=0.125 2023-11-27 20:34:25,678 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.08 vs. limit=15.0 2023-11-27 20:34:29,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3225160.0, ans=0.125 2023-11-27 20:34:35,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3225226.6666666665, ans=0.125 2023-11-27 20:34:37,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3225226.6666666665, ans=0.0 2023-11-27 20:34:40,521 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.89 vs. limit=10.0 2023-11-27 20:34:45,735 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.44 vs. limit=15.0 2023-11-27 20:34:51,466 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 483800 2023-11-27 20:34:51,982 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.34 vs. limit=15.0 2023-11-27 20:34:54,684 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.50 vs. limit=15.0 2023-11-27 20:34:58,495 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 2850, loss[loss=0.03808, simple_loss=0.04035, pruned_loss=0.004489, audio_tagging_loss=0.01342, over 16703.00 frames. ], tot_loss[loss=0.06528, simple_loss=0.08819, pruned_loss=0.01235, audio_tagging_loss=0.008834, over 3040871.56 frames. ], batch size: 66, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:35:06,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3225360.0, ans=0.0 2023-11-27 20:35:10,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3225426.6666666665, ans=0.0 2023-11-27 20:35:23,189 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3225493.3333333335, ans=0.125 2023-11-27 20:35:31,735 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3225560.0, ans=0.0 2023-11-27 20:35:41,036 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.142e+01 8.834e+01 9.311e+01 1.027e+02 1.174e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-27 20:35:48,737 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 483850 2023-11-27 20:35:52,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3225626.6666666665, ans=0.125 2023-11-27 20:35:55,319 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 2900, loss[loss=0.05624, simple_loss=0.07975, pruned_loss=0.008134, audio_tagging_loss=0.008226, over 16133.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08845, pruned_loss=0.0124, audio_tagging_loss=0.008735, over 3040595.96 frames. ], batch size: 62, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:36:13,882 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3225760.0, ans=0.0 2023-11-27 20:36:46,559 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 483900 2023-11-27 20:36:52,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3226026.6666666665, ans=0.0 2023-11-27 20:36:53,786 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 2950, loss[loss=0.04701, simple_loss=0.04638, pruned_loss=0.009741, audio_tagging_loss=0.01408, over 14874.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08906, pruned_loss=0.01244, audio_tagging_loss=0.008801, over 3050316.91 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:37:02,453 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3226026.6666666665, ans=0.125 2023-11-27 20:37:06,816 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3226093.3333333335, ans=0.1 2023-11-27 20:37:07,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3226093.3333333335, ans=0.125 2023-11-27 20:37:14,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3226093.3333333335, ans=0.125 2023-11-27 20:37:27,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten.whitening_limit, batch_count=3226226.6666666665, ans=22.5 2023-11-27 20:37:33,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3226226.6666666665, ans=0.2 2023-11-27 20:37:36,762 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.737e+01 8.664e+01 9.410e+01 9.930e+01 1.488e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-27 20:37:44,521 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 483950 2023-11-27 20:37:51,788 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 3000, loss[loss=0.05813, simple_loss=0.08682, pruned_loss=0.00739, audio_tagging_loss=0.007334, over 14601.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.08988, pruned_loss=0.01263, audio_tagging_loss=0.008767, over 3056430.88 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:37:51,789 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-27 20:38:10,309 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([1.6717, 2.7915, 2.5998, 2.5366, 3.1635, 3.0995, 3.1916, 3.2036], device='cuda:1') 2023-11-27 20:38:15,800 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.7948, 5.8379, 5.8937, 5.8721], device='cuda:1') 2023-11-27 20:38:26,081 INFO [train_asr.py:1267] (1/4) Epoch 41, validation: loss=0.0572, simple_loss=0.05061, pruned_loss=0.005192, audio_tagging_loss=0.0267, over 4681554.00 frames. 2023-11-27 20:38:26,082 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-27 20:38:58,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3226493.3333333335, ans=0.125 2023-11-27 20:39:04,826 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.24 vs. limit=15.0 2023-11-27 20:39:11,207 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3226626.6666666665, ans=0.125 2023-11-27 20:39:17,171 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 484000 2023-11-27 20:39:26,484 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 3050, loss[loss=0.06483, simple_loss=0.08316, pruned_loss=0.01327, audio_tagging_loss=0.009986, over 16104.00 frames. ], tot_loss[loss=0.06678, simple_loss=0.09034, pruned_loss=0.01273, audio_tagging_loss=0.008884, over 3054120.33 frames. ], batch size: 61, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:39:32,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3226693.3333333335, ans=0.125 2023-11-27 20:39:47,371 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3226760.0, ans=0.125 2023-11-27 20:39:51,196 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.45 vs. limit=15.0 2023-11-27 20:39:57,788 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3226826.6666666665, ans=0.2 2023-11-27 20:40:00,579 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.86 vs. limit=10.0 2023-11-27 20:40:00,836 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.44 vs. limit=15.0 2023-11-27 20:40:01,256 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 20:40:01,500 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3226893.3333333335, ans=0.125 2023-11-27 20:40:06,433 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3226893.3333333335, ans=0.1 2023-11-27 20:40:09,336 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.019e+01 8.867e+01 9.400e+01 1.012e+02 1.240e+02, threshold=1.880e+02, percent-clipped=0.0 2023-11-27 20:40:12,988 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3226960.0, ans=0.0 2023-11-27 20:40:15,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3226960.0, ans=0.125 2023-11-27 20:40:17,840 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 484050 2023-11-27 20:40:20,260 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3226960.0, ans=0.0 2023-11-27 20:40:24,363 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 3100, loss[loss=0.07978, simple_loss=0.1127, pruned_loss=0.01453, audio_tagging_loss=0.008885, over 16103.00 frames. ], tot_loss[loss=0.0672, simple_loss=0.09082, pruned_loss=0.01285, audio_tagging_loss=0.008942, over 3050254.01 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:40:31,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3227026.6666666665, ans=0.0 2023-11-27 20:40:37,784 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3227093.3333333335, ans=0.125 2023-11-27 20:40:56,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3227160.0, ans=0.2 2023-11-27 20:41:06,291 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=3227226.6666666665, ans=0.5 2023-11-27 20:41:07,661 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.53 vs. limit=15.0 2023-11-27 20:41:14,732 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 484100 2023-11-27 20:41:19,672 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.21 vs. limit=22.5 2023-11-27 20:41:21,309 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 3150, loss[loss=0.06069, simple_loss=0.07744, pruned_loss=0.01276, audio_tagging_loss=0.009208, over 14964.00 frames. ], tot_loss[loss=0.06683, simple_loss=0.09028, pruned_loss=0.01263, audio_tagging_loss=0.00906, over 3039583.08 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:41:24,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3227360.0, ans=0.125 2023-11-27 20:41:42,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=3227426.6666666665, ans=0.025 2023-11-27 20:41:58,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3227560.0, ans=0.125 2023-11-27 20:42:04,115 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.473e+01 8.840e+01 9.395e+01 9.954e+01 1.405e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-27 20:42:04,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3227560.0, ans=0.125 2023-11-27 20:42:12,793 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 484150 2023-11-27 20:42:15,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3227626.6666666665, ans=0.0 2023-11-27 20:42:19,254 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 3200, loss[loss=0.0874, simple_loss=0.1217, pruned_loss=0.02006, audio_tagging_loss=0.006501, over 14752.00 frames. ], tot_loss[loss=0.06679, simple_loss=0.09036, pruned_loss=0.01258, audio_tagging_loss=0.009029, over 3037139.56 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 20:42:23,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=3227693.3333333335, ans=10.0 2023-11-27 20:42:36,201 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3227760.0, ans=0.0 2023-11-27 20:42:40,865 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.47 vs. limit=15.0 2023-11-27 20:42:50,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3227826.6666666665, ans=0.125 2023-11-27 20:43:01,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3227893.3333333335, ans=0.125 2023-11-27 20:43:06,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3227960.0, ans=0.1 2023-11-27 20:43:11,171 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 484200 2023-11-27 20:43:18,072 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 3250, loss[loss=0.07321, simple_loss=0.1035, pruned_loss=0.01507, audio_tagging_loss=0.006404, over 15247.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.08984, pruned_loss=0.01246, audio_tagging_loss=0.009083, over 3035450.35 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 20:43:22,973 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.45 vs. limit=15.0 2023-11-27 20:43:27,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3228026.6666666665, ans=0.2 2023-11-27 20:43:44,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3228160.0, ans=0.125 2023-11-27 20:43:47,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3228160.0, ans=0.125 2023-11-27 20:43:47,914 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3228160.0, ans=0.2 2023-11-27 20:43:51,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3228226.6666666665, ans=0.125 2023-11-27 20:43:51,158 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3228226.6666666665, ans=0.5 2023-11-27 20:43:55,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3228226.6666666665, ans=0.125 2023-11-27 20:44:00,691 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.473e+01 8.543e+01 9.307e+01 1.025e+02 1.528e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-27 20:44:08,492 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 484250 2023-11-27 20:44:14,948 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 3300, loss[loss=0.07785, simple_loss=0.09516, pruned_loss=0.01975, audio_tagging_loss=0.01052, over 16053.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.08968, pruned_loss=0.01249, audio_tagging_loss=0.009171, over 3041726.60 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 20:44:52,843 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.22 vs. limit=15.0 2023-11-27 20:44:58,545 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.75 vs. limit=15.0 2023-11-27 20:45:06,289 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 484300 2023-11-27 20:45:06,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3228626.6666666665, ans=0.015 2023-11-27 20:45:09,658 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3228626.6666666665, ans=0.1 2023-11-27 20:45:12,793 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 3350, loss[loss=0.06782, simple_loss=0.09297, pruned_loss=0.01391, audio_tagging_loss=0.007426, over 15495.00 frames. ], tot_loss[loss=0.06697, simple_loss=0.09057, pruned_loss=0.01271, audio_tagging_loss=0.008979, over 3048771.81 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:45:25,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3228760.0, ans=0.125 2023-11-27 20:45:32,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=3228760.0, ans=0.5 2023-11-27 20:45:43,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3228826.6666666665, ans=0.125 2023-11-27 20:45:55,623 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.846e+01 8.657e+01 9.246e+01 1.011e+02 1.317e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-27 20:46:02,359 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 484350 2023-11-27 20:46:10,128 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 3400, loss[loss=0.08186, simple_loss=0.1144, pruned_loss=0.01781, audio_tagging_loss=0.006855, over 15338.00 frames. ], tot_loss[loss=0.06701, simple_loss=0.09081, pruned_loss=0.01276, audio_tagging_loss=0.008845, over 3050383.70 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:46:14,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3229026.6666666665, ans=0.1 2023-11-27 20:46:17,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3229026.6666666665, ans=0.0 2023-11-27 20:46:22,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3229093.3333333335, ans=0.0 2023-11-27 20:47:00,260 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 484400 2023-11-27 20:47:07,146 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 3450, loss[loss=0.0501, simple_loss=0.06249, pruned_loss=0.007708, audio_tagging_loss=0.01115, over 16741.00 frames. ], tot_loss[loss=0.06691, simple_loss=0.09059, pruned_loss=0.01279, audio_tagging_loss=0.008826, over 3052035.56 frames. ], batch size: 64, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:47:09,112 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.44 vs. limit=22.5 2023-11-27 20:47:18,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3229426.6666666665, ans=0.125 2023-11-27 20:47:25,688 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3229426.6666666665, ans=0.2 2023-11-27 20:47:27,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3229426.6666666665, ans=0.0 2023-11-27 20:47:35,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3229493.3333333335, ans=0.2 2023-11-27 20:47:43,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3229560.0, ans=0.0 2023-11-27 20:47:50,694 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.123e+01 8.393e+01 9.066e+01 9.893e+01 1.377e+02, threshold=1.813e+02, percent-clipped=0.0 2023-11-27 20:47:57,372 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 484450 2023-11-27 20:48:04,375 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 3500, loss[loss=0.05746, simple_loss=0.07391, pruned_loss=0.009346, audio_tagging_loss=0.01116, over 13108.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.08949, pruned_loss=0.01255, audio_tagging_loss=0.008877, over 3046577.09 frames. ], batch size: 53, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:48:27,007 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.55 vs. limit=12.0 2023-11-27 20:48:28,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3229826.6666666665, ans=0.0 2023-11-27 20:48:31,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3229826.6666666665, ans=0.0 2023-11-27 20:48:35,203 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 20:48:41,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3229893.3333333335, ans=0.2 2023-11-27 20:48:43,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3229893.3333333335, ans=0.2 2023-11-27 20:48:45,036 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3229893.3333333335, ans=0.125 2023-11-27 20:48:52,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3229960.0, ans=0.2 2023-11-27 20:48:54,660 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 484500 2023-11-27 20:48:58,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3229960.0, ans=0.1 2023-11-27 20:49:01,727 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 3550, loss[loss=0.06859, simple_loss=0.08839, pruned_loss=0.01687, audio_tagging_loss=0.007533, over 14303.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.09001, pruned_loss=0.01261, audio_tagging_loss=0.008898, over 3048012.44 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:49:09,121 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3230026.6666666665, ans=0.125 2023-11-27 20:49:14,708 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3230093.3333333335, ans=0.125 2023-11-27 20:49:16,035 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.72 vs. limit=15.0 2023-11-27 20:49:45,462 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.934e+01 8.597e+01 9.146e+01 1.002e+02 1.167e+02, threshold=1.829e+02, percent-clipped=0.0 2023-11-27 20:49:47,313 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.48 vs. limit=15.0 2023-11-27 20:49:52,876 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 484550 2023-11-27 20:49:59,412 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 3600, loss[loss=0.09394, simple_loss=0.1293, pruned_loss=0.02293, audio_tagging_loss=0.006379, over 15855.00 frames. ], tot_loss[loss=0.06675, simple_loss=0.09054, pruned_loss=0.01269, audio_tagging_loss=0.008795, over 3047187.46 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:50:14,441 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3230426.6666666665, ans=0.125 2023-11-27 20:50:21,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3230493.3333333335, ans=0.05 2023-11-27 20:50:23,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3230493.3333333335, ans=0.125 2023-11-27 20:50:24,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3230493.3333333335, ans=0.0 2023-11-27 20:50:28,189 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3230493.3333333335, ans=0.0 2023-11-27 20:50:28,649 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.67 vs. limit=15.0 2023-11-27 20:50:38,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff3.min_abs, batch_count=3230560.0, ans=0.2 2023-11-27 20:50:49,726 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 484600 2023-11-27 20:50:52,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3230626.6666666665, ans=0.1 2023-11-27 20:50:56,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3230693.3333333335, ans=0.125 2023-11-27 20:50:57,223 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 3650, loss[loss=0.0549, simple_loss=0.07555, pruned_loss=0.006853, audio_tagging_loss=0.01027, over 14880.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.08971, pruned_loss=0.01253, audio_tagging_loss=0.008797, over 3052246.22 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:50:58,561 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3230693.3333333335, ans=0.1 2023-11-27 20:51:09,602 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3230760.0, ans=0.125 2023-11-27 20:51:41,948 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.602e+01 8.915e+01 9.732e+01 1.035e+02 1.318e+02, threshold=1.946e+02, percent-clipped=0.0 2023-11-27 20:51:44,459 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3230960.0, ans=0.04949747468305833 2023-11-27 20:51:47,444 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 484650 2023-11-27 20:51:51,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3230960.0, ans=0.2 2023-11-27 20:51:51,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3230960.0, ans=0.5 2023-11-27 20:51:53,887 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 3700, loss[loss=0.0811, simple_loss=0.1144, pruned_loss=0.01636, audio_tagging_loss=0.007536, over 14576.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.0895, pruned_loss=0.01254, audio_tagging_loss=0.008776, over 3051922.20 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:52:37,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3231226.6666666665, ans=0.0 2023-11-27 20:52:45,190 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 484700 2023-11-27 20:52:47,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3231293.3333333335, ans=0.125 2023-11-27 20:52:51,692 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 3750, loss[loss=0.07032, simple_loss=0.0848, pruned_loss=0.01552, audio_tagging_loss=0.0124, over 14291.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.0893, pruned_loss=0.01234, audio_tagging_loss=0.0088, over 3050607.51 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:53:11,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3231426.6666666665, ans=0.0 2023-11-27 20:53:17,601 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3231493.3333333335, ans=0.125 2023-11-27 20:53:18,735 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3231493.3333333335, ans=0.125 2023-11-27 20:53:24,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3231493.3333333335, ans=0.125 2023-11-27 20:53:25,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3231560.0, ans=0.0 2023-11-27 20:53:29,140 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 20:53:33,178 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 20:53:36,405 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.100e+01 8.777e+01 9.406e+01 1.027e+02 1.290e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-27 20:53:42,634 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 484750 2023-11-27 20:53:45,009 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3231626.6666666665, ans=0.125 2023-11-27 20:53:48,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3231693.3333333335, ans=0.125 2023-11-27 20:53:49,564 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 3800, loss[loss=0.08224, simple_loss=0.1063, pruned_loss=0.01861, audio_tagging_loss=0.01047, over 15344.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.08953, pruned_loss=0.01239, audio_tagging_loss=0.008936, over 3055629.88 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:53:52,186 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=3231693.3333333335, ans=15.0 2023-11-27 20:53:58,816 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.14 vs. limit=22.5 2023-11-27 20:54:34,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3231960.0, ans=0.0 2023-11-27 20:54:39,512 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 484800 2023-11-27 20:54:42,212 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3231960.0, ans=0.125 2023-11-27 20:54:44,746 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.01 vs. limit=10.0 2023-11-27 20:54:46,248 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 3850, loss[loss=0.04927, simple_loss=0.06106, pruned_loss=0.0102, audio_tagging_loss=0.008549, over 14728.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.08889, pruned_loss=0.01234, audio_tagging_loss=0.009004, over 3052174.05 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:54:59,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3232093.3333333335, ans=0.0 2023-11-27 20:55:08,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3232160.0, ans=0.125 2023-11-27 20:55:10,664 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3232160.0, ans=0.0 2023-11-27 20:55:12,108 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.33 vs. limit=22.5 2023-11-27 20:55:15,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3232160.0, ans=0.09899494936611666 2023-11-27 20:55:31,025 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.176e+01 8.660e+01 9.334e+01 1.001e+02 1.347e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-27 20:55:35,254 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3232293.3333333335, ans=0.125 2023-11-27 20:55:37,219 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 484850 2023-11-27 20:55:37,500 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3232293.3333333335, ans=0.125 2023-11-27 20:55:43,747 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 3900, loss[loss=0.07181, simple_loss=0.0994, pruned_loss=0.01218, audio_tagging_loss=0.009927, over 14492.00 frames. ], tot_loss[loss=0.0669, simple_loss=0.09069, pruned_loss=0.01261, audio_tagging_loss=0.00894, over 3047067.78 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:55:48,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3232360.0, ans=0.125 2023-11-27 20:55:51,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3232360.0, ans=0.0 2023-11-27 20:56:02,104 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3232426.6666666665, ans=0.125 2023-11-27 20:56:20,259 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3232560.0, ans=0.125 2023-11-27 20:56:23,546 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3232560.0, ans=0.125 2023-11-27 20:56:31,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3232626.6666666665, ans=0.0 2023-11-27 20:56:34,434 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 484900 2023-11-27 20:56:35,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3232626.6666666665, ans=0.125 2023-11-27 20:56:40,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3232693.3333333335, ans=0.0 2023-11-27 20:56:42,121 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 3950, loss[loss=0.08508, simple_loss=0.1129, pruned_loss=0.02129, audio_tagging_loss=0.007336, over 14795.00 frames. ], tot_loss[loss=0.06645, simple_loss=0.09006, pruned_loss=0.01242, audio_tagging_loss=0.009004, over 3050740.19 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:57:03,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3232760.0, ans=0.125 2023-11-27 20:57:26,950 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.353e+01 8.571e+01 9.462e+01 1.016e+02 1.341e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-27 20:57:32,698 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 484950 2023-11-27 20:57:39,308 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 4000, loss[loss=0.08417, simple_loss=0.1135, pruned_loss=0.02151, audio_tagging_loss=0.005907, over 14637.00 frames. ], tot_loss[loss=0.06692, simple_loss=0.09062, pruned_loss=0.0126, audio_tagging_loss=0.009012, over 3051764.46 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 20:57:39,791 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2023-11-27 20:58:02,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3233160.0, ans=0.125 2023-11-27 20:58:07,912 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.05 vs. limit=15.0 2023-11-27 20:58:10,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3233160.0, ans=0.125 2023-11-27 20:58:27,361 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3233293.3333333335, ans=0.0 2023-11-27 20:58:29,442 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 485000 2023-11-27 20:58:32,981 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3233293.3333333335, ans=0.1 2023-11-27 20:58:36,271 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 4050, loss[loss=0.06952, simple_loss=0.09431, pruned_loss=0.0141, audio_tagging_loss=0.008271, over 14124.00 frames. ], tot_loss[loss=0.06717, simple_loss=0.09096, pruned_loss=0.01268, audio_tagging_loss=0.009002, over 3046734.58 frames. ], batch size: 53, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 20:58:40,034 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.89 vs. limit=15.0 2023-11-27 20:58:40,684 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 20:58:48,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3233426.6666666665, ans=0.025 2023-11-27 20:58:58,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3233493.3333333335, ans=0.125 2023-11-27 20:59:10,936 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 20:59:12,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3233560.0, ans=0.125 2023-11-27 20:59:20,317 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.753e+01 8.859e+01 9.526e+01 1.036e+02 1.251e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-27 20:59:25,744 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 485050 2023-11-27 20:59:25,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3233626.6666666665, ans=0.125 2023-11-27 20:59:32,427 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 4100, loss[loss=0.05055, simple_loss=0.06884, pruned_loss=0.009584, audio_tagging_loss=0.006549, over 15191.00 frames. ], tot_loss[loss=0.06694, simple_loss=0.09094, pruned_loss=0.01254, audio_tagging_loss=0.008928, over 3047009.09 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 20:59:41,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3233693.3333333335, ans=0.0 2023-11-27 20:59:55,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3233826.6666666665, ans=0.125 2023-11-27 20:59:56,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3233826.6666666665, ans=0.1 2023-11-27 21:00:09,034 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.53 vs. limit=15.0 2023-11-27 21:00:23,224 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 485100 2023-11-27 21:00:24,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3233960.0, ans=0.0 2023-11-27 21:00:30,332 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 4150, loss[loss=0.07404, simple_loss=0.1102, pruned_loss=0.01322, audio_tagging_loss=0.005735, over 15286.00 frames. ], tot_loss[loss=0.06691, simple_loss=0.09104, pruned_loss=0.01256, audio_tagging_loss=0.008838, over 3050420.89 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 21:00:44,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3234093.3333333335, ans=0.1 2023-11-27 21:01:05,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3234226.6666666665, ans=0.2 2023-11-27 21:01:09,007 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.76 vs. limit=10.0 2023-11-27 21:01:13,448 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 21:01:15,586 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.836e+01 8.501e+01 9.252e+01 1.004e+02 1.216e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-27 21:01:20,638 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 485150 2023-11-27 21:01:24,172 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3234293.3333333335, ans=0.1 2023-11-27 21:01:27,071 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 4200, loss[loss=0.07897, simple_loss=0.1087, pruned_loss=0.01602, audio_tagging_loss=0.008624, over 15677.00 frames. ], tot_loss[loss=0.0667, simple_loss=0.09117, pruned_loss=0.01244, audio_tagging_loss=0.008677, over 3055675.14 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:01:31,727 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3234360.0, ans=0.2 2023-11-27 21:01:46,735 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.64 vs. limit=12.0 2023-11-27 21:01:48,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3234426.6666666665, ans=0.125 2023-11-27 21:01:48,122 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3234426.6666666665, ans=0.04949747468305833 2023-11-27 21:01:53,659 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.05 vs. limit=15.0 2023-11-27 21:02:02,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3234560.0, ans=0.1 2023-11-27 21:02:04,592 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3234560.0, ans=0.1 2023-11-27 21:02:11,070 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 21:02:17,481 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 485200 2023-11-27 21:02:24,405 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 4250, loss[loss=0.06546, simple_loss=0.06803, pruned_loss=0.019, audio_tagging_loss=0.01244, over 15166.00 frames. ], tot_loss[loss=0.06716, simple_loss=0.09149, pruned_loss=0.01265, audio_tagging_loss=0.008765, over 3062555.16 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:03:08,379 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3234893.3333333335, ans=0.0 2023-11-27 21:03:10,369 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.068e+01 9.065e+01 9.544e+01 1.011e+02 1.214e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-27 21:03:15,370 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 485250 2023-11-27 21:03:17,072 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.24 vs. limit=15.0 2023-11-27 21:03:21,928 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 4300, loss[loss=0.06607, simple_loss=0.09042, pruned_loss=0.01163, audio_tagging_loss=0.009223, over 14787.00 frames. ], tot_loss[loss=0.06675, simple_loss=0.09092, pruned_loss=0.01256, audio_tagging_loss=0.008736, over 3063505.29 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:03:23,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3235026.6666666665, ans=0.125 2023-11-27 21:03:34,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3235093.3333333335, ans=0.0 2023-11-27 21:03:40,933 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 21:04:05,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3235226.6666666665, ans=0.1 2023-11-27 21:04:12,595 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 485300 2023-11-27 21:04:19,758 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 4350, loss[loss=0.06678, simple_loss=0.09017, pruned_loss=0.01317, audio_tagging_loss=0.008527, over 15963.00 frames. ], tot_loss[loss=0.06668, simple_loss=0.09086, pruned_loss=0.01253, audio_tagging_loss=0.008712, over 3054129.53 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 8.0 2023-11-27 21:04:42,940 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.69 vs. limit=15.0 2023-11-27 21:04:49,737 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.33 vs. limit=10.0 2023-11-27 21:04:52,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3235560.0, ans=0.125 2023-11-27 21:04:52,909 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.57 vs. limit=22.5 2023-11-27 21:05:02,459 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3235560.0, ans=0.2 2023-11-27 21:05:06,665 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.319e+01 9.007e+01 9.649e+01 1.037e+02 1.293e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-27 21:05:06,898 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3235626.6666666665, ans=0.1 2023-11-27 21:05:10,133 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 485350 2023-11-27 21:05:16,693 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 4400, loss[loss=0.06683, simple_loss=0.0893, pruned_loss=0.01482, audio_tagging_loss=0.007359, over 14495.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.09068, pruned_loss=0.01253, audio_tagging_loss=0.008669, over 3053400.10 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:05:19,051 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3235693.3333333335, ans=0.125 2023-11-27 21:05:34,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3235760.0, ans=0.125 2023-11-27 21:05:55,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3235893.3333333335, ans=0.5 2023-11-27 21:06:05,969 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 485400 2023-11-27 21:06:13,256 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 4450, loss[loss=0.09057, simple_loss=0.12, pruned_loss=0.02162, audio_tagging_loss=0.008968, over 15301.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.09074, pruned_loss=0.01258, audio_tagging_loss=0.00859, over 3058089.84 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:06:30,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3236093.3333333335, ans=0.2 2023-11-27 21:06:37,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3236160.0, ans=0.2 2023-11-27 21:06:39,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3236160.0, ans=0.125 2023-11-27 21:06:41,073 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.70 vs. limit=15.0 2023-11-27 21:07:00,278 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.283e+01 8.705e+01 9.463e+01 1.011e+02 1.177e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-27 21:07:03,704 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 485450 2023-11-27 21:07:10,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3236360.0, ans=0.2 2023-11-27 21:07:11,587 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 4500, loss[loss=0.0565, simple_loss=0.0726, pruned_loss=0.01048, audio_tagging_loss=0.009712, over 14748.00 frames. ], tot_loss[loss=0.06717, simple_loss=0.09171, pruned_loss=0.01278, audio_tagging_loss=0.008537, over 3054426.39 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:07:23,027 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.60 vs. limit=22.5 2023-11-27 21:07:30,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3236426.6666666665, ans=0.125 2023-11-27 21:07:37,871 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.16 vs. limit=15.0 2023-11-27 21:07:38,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3236493.3333333335, ans=0.04949747468305833 2023-11-27 21:07:46,362 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3236560.0, ans=0.125 2023-11-27 21:07:47,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3236560.0, ans=0.0 2023-11-27 21:07:48,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3236560.0, ans=0.125 2023-11-27 21:08:01,667 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 485500 2023-11-27 21:08:02,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3236626.6666666665, ans=0.0 2023-11-27 21:08:08,300 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 4550, loss[loss=0.06231, simple_loss=0.08828, pruned_loss=0.01097, audio_tagging_loss=0.007197, over 15411.00 frames. ], tot_loss[loss=0.06711, simple_loss=0.0916, pruned_loss=0.01278, audio_tagging_loss=0.00853, over 3055667.01 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:08:08,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3236693.3333333335, ans=0.0 2023-11-27 21:08:22,610 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.51 vs. limit=15.0 2023-11-27 21:08:26,581 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3236760.0, ans=0.1 2023-11-27 21:08:26,876 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.82 vs. limit=15.0 2023-11-27 21:08:31,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3236826.6666666665, ans=0.125 2023-11-27 21:08:46,720 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.47 vs. limit=15.0 2023-11-27 21:08:52,683 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 21:08:54,915 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.586e+01 8.582e+01 9.237e+01 9.730e+01 1.372e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-27 21:08:58,359 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 485550 2023-11-27 21:09:05,346 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 4600, loss[loss=0.06156, simple_loss=0.09041, pruned_loss=0.009623, audio_tagging_loss=0.006729, over 14935.00 frames. ], tot_loss[loss=0.06673, simple_loss=0.09088, pruned_loss=0.0126, audio_tagging_loss=0.008687, over 3054438.43 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:09:05,577 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 21:09:47,088 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.06 vs. limit=22.5 2023-11-27 21:09:55,282 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 485600 2023-11-27 21:09:55,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3237293.3333333335, ans=0.0 2023-11-27 21:10:02,137 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 4650, loss[loss=0.06372, simple_loss=0.09005, pruned_loss=0.009689, audio_tagging_loss=0.009008, over 14713.00 frames. ], tot_loss[loss=0.0669, simple_loss=0.09101, pruned_loss=0.01261, audio_tagging_loss=0.008782, over 3054218.02 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:10:09,326 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3237360.0, ans=0.125 2023-11-27 21:10:22,465 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.59 vs. limit=15.0 2023-11-27 21:10:41,235 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3237560.0, ans=0.2 2023-11-27 21:10:43,418 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 21:10:49,295 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.300e+01 8.818e+01 9.409e+01 9.994e+01 1.817e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-27 21:10:52,676 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 485650 2023-11-27 21:10:59,634 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 4700, loss[loss=0.06707, simple_loss=0.09673, pruned_loss=0.009177, audio_tagging_loss=0.00953, over 14050.00 frames. ], tot_loss[loss=0.06719, simple_loss=0.09115, pruned_loss=0.0127, audio_tagging_loss=0.008922, over 3049451.98 frames. ], batch size: 52, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:11:13,983 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.40 vs. limit=15.0 2023-11-27 21:11:16,288 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.63 vs. limit=15.0 2023-11-27 21:11:25,583 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3237826.6666666665, ans=0.125 2023-11-27 21:11:25,959 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.19 vs. limit=15.0 2023-11-27 21:11:36,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3237893.3333333335, ans=0.125 2023-11-27 21:11:49,948 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 485700 2023-11-27 21:11:56,976 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 4750, loss[loss=0.0842, simple_loss=0.1185, pruned_loss=0.01769, audio_tagging_loss=0.007268, over 15822.00 frames. ], tot_loss[loss=0.06728, simple_loss=0.09114, pruned_loss=0.01279, audio_tagging_loss=0.008916, over 3042542.58 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:11:58,556 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.55 vs. limit=15.0 2023-11-27 21:11:58,715 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.08 vs. limit=12.0 2023-11-27 21:12:00,983 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.42 vs. limit=6.0 2023-11-27 21:12:43,626 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.636e+01 8.911e+01 9.575e+01 1.033e+02 1.448e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-27 21:12:46,981 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 485750 2023-11-27 21:12:53,380 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 4800, loss[loss=0.04784, simple_loss=0.05755, pruned_loss=0.007307, audio_tagging_loss=0.01176, over 14588.00 frames. ], tot_loss[loss=0.06722, simple_loss=0.09107, pruned_loss=0.01266, audio_tagging_loss=0.009027, over 3041511.11 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 21:13:07,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3238426.6666666665, ans=0.0 2023-11-27 21:13:18,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3238493.3333333335, ans=0.125 2023-11-27 21:13:21,169 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3238493.3333333335, ans=0.125 2023-11-27 21:13:22,667 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 21:13:32,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=3238560.0, ans=0.05 2023-11-27 21:13:36,866 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3238560.0, ans=0.0 2023-11-27 21:13:44,074 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 485800 2023-11-27 21:13:50,916 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 4850, loss[loss=0.07868, simple_loss=0.1106, pruned_loss=0.01654, audio_tagging_loss=0.006824, over 15239.00 frames. ], tot_loss[loss=0.06759, simple_loss=0.09127, pruned_loss=0.01285, audio_tagging_loss=0.009105, over 3047532.47 frames. ], batch size: 53, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:13:51,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3238693.3333333335, ans=0.0 2023-11-27 21:13:51,530 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.80 vs. limit=15.0 2023-11-27 21:14:33,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3238893.3333333335, ans=0.07 2023-11-27 21:14:39,773 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.239e+01 8.891e+01 9.390e+01 1.010e+02 1.385e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-27 21:14:43,014 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 485850 2023-11-27 21:14:49,867 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 4900, loss[loss=0.06546, simple_loss=0.09074, pruned_loss=0.01112, audio_tagging_loss=0.008967, over 14593.00 frames. ], tot_loss[loss=0.06738, simple_loss=0.091, pruned_loss=0.01281, audio_tagging_loss=0.009073, over 3044811.20 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:14:53,376 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3239026.6666666665, ans=0.1 2023-11-27 21:15:35,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3239226.6666666665, ans=0.0 2023-11-27 21:15:47,517 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 485900 2023-11-27 21:15:52,777 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3239293.3333333335, ans=0.2 2023-11-27 21:15:57,747 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 4950, loss[loss=0.06309, simple_loss=0.08791, pruned_loss=0.01325, audio_tagging_loss=0.005888, over 16205.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.08981, pruned_loss=0.01251, audio_tagging_loss=0.009001, over 3039063.76 frames. ], batch size: 61, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:16:02,625 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.49 vs. limit=12.0 2023-11-27 21:16:10,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3239360.0, ans=0.0 2023-11-27 21:16:59,434 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3239560.0, ans=0.125 2023-11-27 21:17:01,189 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3239560.0, ans=0.025 2023-11-27 21:17:30,955 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.417e+01 8.683e+01 9.334e+01 9.956e+01 1.191e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-27 21:17:35,199 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.20 vs. limit=15.0 2023-11-27 21:17:36,530 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 485950 2023-11-27 21:17:48,911 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 5000, loss[loss=0.05955, simple_loss=0.08186, pruned_loss=0.01067, audio_tagging_loss=0.007952, over 15952.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.08982, pruned_loss=0.01243, audio_tagging_loss=0.008837, over 3040797.16 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:18:13,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3239760.0, ans=0.1 2023-11-27 21:18:17,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3239760.0, ans=0.2 2023-11-27 21:18:38,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3239826.6666666665, ans=0.1 2023-11-27 21:18:40,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3239826.6666666665, ans=0.125 2023-11-27 21:18:44,777 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3239893.3333333335, ans=0.125 2023-11-27 21:18:48,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3239893.3333333335, ans=0.125 2023-11-27 21:18:48,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3239893.3333333335, ans=0.0 2023-11-27 21:18:54,913 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3239893.3333333335, ans=0.125 2023-11-27 21:19:05,758 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3239960.0, ans=0.125 2023-11-27 21:19:11,239 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 486000 2023-11-27 21:19:22,932 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 5050, loss[loss=0.09818, simple_loss=0.1376, pruned_loss=0.02143, audio_tagging_loss=0.00793, over 16175.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.09005, pruned_loss=0.01245, audio_tagging_loss=0.008724, over 3036013.23 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:19:33,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3240026.6666666665, ans=0.125 2023-11-27 21:19:41,055 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.77 vs. limit=15.0 2023-11-27 21:20:15,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3240160.0, ans=0.07 2023-11-27 21:21:17,507 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.58 vs. limit=5.0 2023-11-27 21:21:24,904 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 21:22:11,340 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.444e+01 8.669e+01 9.381e+01 9.908e+01 1.305e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-27 21:22:23,136 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 486050 2023-11-27 21:22:25,962 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.23 vs. limit=15.0 2023-11-27 21:22:53,926 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 5100, loss[loss=0.06565, simple_loss=0.09012, pruned_loss=0.01184, audio_tagging_loss=0.008752, over 15973.00 frames. ], tot_loss[loss=0.06708, simple_loss=0.09142, pruned_loss=0.01276, audio_tagging_loss=0.008612, over 3036795.49 frames. ], batch size: 60, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:23:43,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3240426.6666666665, ans=0.0 2023-11-27 21:23:48,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3240426.6666666665, ans=0.125 2023-11-27 21:24:04,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3240426.6666666665, ans=0.125 2023-11-27 21:24:29,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3240493.3333333335, ans=0.125 2023-11-27 21:24:33,825 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.79 vs. limit=6.0 2023-11-27 21:26:15,531 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 486100 2023-11-27 21:26:47,678 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 5150, loss[loss=0.07547, simple_loss=0.09865, pruned_loss=0.0138, audio_tagging_loss=0.01235, over 15680.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.08991, pruned_loss=0.01255, audio_tagging_loss=0.008694, over 3042939.45 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:27:09,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3240693.3333333335, ans=0.2 2023-11-27 21:27:37,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3240760.0, ans=0.125 2023-11-27 21:29:58,928 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.929e+01 8.896e+01 9.394e+01 1.012e+02 1.340e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-27 21:30:06,222 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 486150 2023-11-27 21:30:17,479 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.04 vs. limit=15.0 2023-11-27 21:30:28,683 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 5200, loss[loss=0.0523, simple_loss=0.06945, pruned_loss=0.008578, audio_tagging_loss=0.008994, over 14986.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.09008, pruned_loss=0.01251, audio_tagging_loss=0.008589, over 3042158.25 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 21:30:42,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3241026.6666666665, ans=0.0 2023-11-27 21:31:20,705 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3241093.3333333335, ans=0.0 2023-11-27 21:33:15,952 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3241293.3333333335, ans=0.07 2023-11-27 21:33:19,150 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 486200 2023-11-27 21:33:37,912 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3241293.3333333335, ans=0.125 2023-11-27 21:33:45,356 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 5250, loss[loss=0.04852, simple_loss=0.07329, pruned_loss=0.00442, audio_tagging_loss=0.007454, over 15096.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.09, pruned_loss=0.01258, audio_tagging_loss=0.008602, over 3039991.10 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:34:38,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3241426.6666666665, ans=0.2 2023-11-27 21:35:06,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3241493.3333333335, ans=0.125 2023-11-27 21:35:10,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3241493.3333333335, ans=0.0 2023-11-27 21:35:35,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3241560.0, ans=0.125 2023-11-27 21:36:09,375 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.968e+01 8.648e+01 9.300e+01 9.886e+01 1.149e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-27 21:36:11,544 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 486250 2023-11-27 21:36:21,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3241626.6666666665, ans=0.0 2023-11-27 21:36:31,497 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 5300, loss[loss=0.09039, simple_loss=0.1337, pruned_loss=0.01627, audio_tagging_loss=0.007242, over 15279.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.09043, pruned_loss=0.01245, audio_tagging_loss=0.008525, over 3037617.55 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:36:36,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3241693.3333333335, ans=0.0 2023-11-27 21:37:52,171 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3241893.3333333335, ans=0.0 2023-11-27 21:37:59,538 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3241893.3333333335, ans=0.125 2023-11-27 21:38:29,269 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.38 vs. limit=22.5 2023-11-27 21:38:41,091 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 486300 2023-11-27 21:38:41,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3241960.0, ans=0.125 2023-11-27 21:38:47,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3241960.0, ans=0.0 2023-11-27 21:38:58,000 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 5350, loss[loss=0.05805, simple_loss=0.08628, pruned_loss=0.008474, audio_tagging_loss=0.006438, over 14517.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.08985, pruned_loss=0.01231, audio_tagging_loss=0.008606, over 3037512.86 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:40:57,059 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.785e+01 8.875e+01 9.269e+01 1.018e+02 1.292e+02, threshold=1.854e+02, percent-clipped=0.0 2023-11-27 21:40:59,814 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 486350 2023-11-27 21:41:13,828 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 5400, loss[loss=0.07837, simple_loss=0.107, pruned_loss=0.01705, audio_tagging_loss=0.007822, over 16002.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.08907, pruned_loss=0.0124, audio_tagging_loss=0.008724, over 3034411.89 frames. ], batch size: 60, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:41:42,593 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3242426.6666666665, ans=0.125 2023-11-27 21:42:33,419 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.64 vs. limit=15.0 2023-11-27 21:43:16,340 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 486400 2023-11-27 21:43:34,638 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 5450, loss[loss=0.07391, simple_loss=0.1024, pruned_loss=0.01376, audio_tagging_loss=0.008953, over 15852.00 frames. ], tot_loss[loss=0.0664, simple_loss=0.08991, pruned_loss=0.01266, audio_tagging_loss=0.008785, over 3034270.22 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:44:06,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff3.min_abs, batch_count=3242760.0, ans=0.2 2023-11-27 21:44:50,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3242893.3333333335, ans=0.1 2023-11-27 21:45:25,865 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.612e+01 8.703e+01 9.302e+01 9.943e+01 1.219e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-27 21:45:27,833 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 486450 2023-11-27 21:45:40,875 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 5500, loss[loss=0.06983, simple_loss=0.09393, pruned_loss=0.01204, audio_tagging_loss=0.01082, over 15569.00 frames. ], tot_loss[loss=0.0663, simple_loss=0.08974, pruned_loss=0.01256, audio_tagging_loss=0.008871, over 3040307.48 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:46:03,924 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3243093.3333333335, ans=0.125 2023-11-27 21:47:28,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3243293.3333333335, ans=0.125 2023-11-27 21:47:32,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3243293.3333333335, ans=0.125 2023-11-27 21:47:37,369 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 486500 2023-11-27 21:47:49,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3243293.3333333335, ans=0.0 2023-11-27 21:47:53,180 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 5550, loss[loss=0.06291, simple_loss=0.08139, pruned_loss=0.01306, audio_tagging_loss=0.009154, over 15833.00 frames. ], tot_loss[loss=0.06678, simple_loss=0.09027, pruned_loss=0.0127, audio_tagging_loss=0.008948, over 3039112.30 frames. ], batch size: 61, lr: 1.65e-03, grad_scale: 8.0 2023-11-27 21:48:01,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3243360.0, ans=0.07 2023-11-27 21:48:03,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3243360.0, ans=0.0 2023-11-27 21:48:05,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3243360.0, ans=0.0 2023-11-27 21:48:55,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3243493.3333333335, ans=0.0 2023-11-27 21:49:21,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=3243560.0, ans=0.05 2023-11-27 21:49:25,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3243560.0, ans=0.125 2023-11-27 21:49:40,796 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.124e+01 8.844e+01 9.360e+01 9.886e+01 1.640e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-27 21:49:41,083 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 486550 2023-11-27 21:49:53,621 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 5600, loss[loss=0.04854, simple_loss=0.06505, pruned_loss=0.007157, audio_tagging_loss=0.008863, over 14693.00 frames. ], tot_loss[loss=0.06653, simple_loss=0.0898, pruned_loss=0.01264, audio_tagging_loss=0.00899, over 3036883.15 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:50:34,823 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.57 vs. limit=22.5 2023-11-27 21:50:36,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3243760.0, ans=0.1 2023-11-27 21:50:54,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3243826.6666666665, ans=0.1 2023-11-27 21:51:13,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3243893.3333333335, ans=0.0 2023-11-27 21:51:21,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3243893.3333333335, ans=0.0 2023-11-27 21:51:25,700 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 21:51:43,073 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 486600 2023-11-27 21:51:58,866 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 5650, loss[loss=0.05305, simple_loss=0.05648, pruned_loss=0.008942, audio_tagging_loss=0.01587, over 14984.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.08937, pruned_loss=0.01246, audio_tagging_loss=0.00913, over 3039957.62 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:52:21,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3244093.3333333335, ans=0.0 2023-11-27 21:53:32,319 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.852e+01 8.720e+01 9.211e+01 9.882e+01 1.405e+02, threshold=1.842e+02, percent-clipped=0.0 2023-11-27 21:53:32,504 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 486650 2023-11-27 21:53:34,485 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3244293.3333333335, ans=0.0 2023-11-27 21:53:34,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3244293.3333333335, ans=0.1 2023-11-27 21:53:41,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3244360.0, ans=0.125 2023-11-27 21:53:42,098 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 5700, loss[loss=0.06452, simple_loss=0.08674, pruned_loss=0.01144, audio_tagging_loss=0.009703, over 14759.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.08949, pruned_loss=0.01238, audio_tagging_loss=0.009049, over 3037962.48 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:53:53,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3244360.0, ans=0.125 2023-11-27 21:53:57,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3244360.0, ans=0.0 2023-11-27 21:54:12,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3244426.6666666665, ans=0.0 2023-11-27 21:54:22,676 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3244426.6666666665, ans=0.0 2023-11-27 21:54:30,693 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.01 vs. limit=15.0 2023-11-27 21:55:16,708 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 486700 2023-11-27 21:55:17,622 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.30 vs. limit=15.0 2023-11-27 21:55:26,038 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.53 vs. limit=6.0 2023-11-27 21:55:28,984 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 5750, loss[loss=0.06763, simple_loss=0.08989, pruned_loss=0.01476, audio_tagging_loss=0.007922, over 15270.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.08945, pruned_loss=0.01242, audio_tagging_loss=0.009017, over 3042811.39 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:55:37,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3244693.3333333335, ans=0.1 2023-11-27 21:55:40,875 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3244693.3333333335, ans=0.2 2023-11-27 21:55:50,021 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.72 vs. limit=10.0 2023-11-27 21:56:43,330 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3244893.3333333335, ans=0.125 2023-11-27 21:56:51,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3244960.0, ans=0.0 2023-11-27 21:56:55,261 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.899e+01 8.667e+01 9.281e+01 1.002e+02 1.374e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-27 21:56:55,704 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 486750 2023-11-27 21:57:08,429 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 5800, loss[loss=0.04836, simple_loss=0.06757, pruned_loss=0.008634, audio_tagging_loss=0.005939, over 15404.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.08968, pruned_loss=0.0125, audio_tagging_loss=0.008907, over 3040718.86 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:57:52,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3245160.0, ans=0.125 2023-11-27 21:58:29,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3245293.3333333335, ans=0.0 2023-11-27 21:58:30,917 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 486800 2023-11-27 21:58:42,499 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 5850, loss[loss=0.07878, simple_loss=0.1112, pruned_loss=0.01487, audio_tagging_loss=0.008306, over 14505.00 frames. ], tot_loss[loss=0.06648, simple_loss=0.09011, pruned_loss=0.01266, audio_tagging_loss=0.008764, over 3040937.94 frames. ], batch size: 53, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:58:43,003 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3245360.0, ans=0.2 2023-11-27 21:58:52,014 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.15 vs. limit=6.0 2023-11-27 21:59:17,505 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3245426.6666666665, ans=0.0 2023-11-27 22:00:03,995 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.637e+01 8.902e+01 9.558e+01 1.050e+02 1.471e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-27 22:00:04,195 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 486850 2023-11-27 22:00:04,546 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3245626.6666666665, ans=0.2 2023-11-27 22:00:14,035 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 5900, loss[loss=0.05594, simple_loss=0.07574, pruned_loss=0.008406, audio_tagging_loss=0.009667, over 14485.00 frames. ], tot_loss[loss=0.06668, simple_loss=0.09041, pruned_loss=0.01272, audio_tagging_loss=0.008752, over 3042183.19 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:00:22,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3245693.3333333335, ans=0.125 2023-11-27 22:01:27,225 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 486900 2023-11-27 22:01:36,071 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 5950, loss[loss=0.04944, simple_loss=0.06184, pruned_loss=0.007986, audio_tagging_loss=0.01053, over 15517.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.09046, pruned_loss=0.0126, audio_tagging_loss=0.008718, over 3040343.49 frames. ], batch size: 61, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:01:52,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3246093.3333333335, ans=0.125 2023-11-27 22:01:53,122 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.89 vs. limit=15.0 2023-11-27 22:01:54,151 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3246093.3333333335, ans=0.125 2023-11-27 22:02:08,425 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 22:02:09,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3246160.0, ans=0.125 2023-11-27 22:02:43,392 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.049e+01 8.680e+01 9.187e+01 9.808e+01 1.354e+02, threshold=1.837e+02, percent-clipped=0.0 2023-11-27 22:02:43,519 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 486950 2023-11-27 22:02:53,449 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 6000, loss[loss=0.0624, simple_loss=0.0796, pruned_loss=0.01401, audio_tagging_loss=0.008588, over 15619.00 frames. ], tot_loss[loss=0.0663, simple_loss=0.09008, pruned_loss=0.01256, audio_tagging_loss=0.008707, over 3042272.03 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 22:02:53,449 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-27 22:03:35,225 INFO [train_asr.py:1267] (1/4) Epoch 41, validation: loss=0.05724, simple_loss=0.05055, pruned_loss=0.005142, audio_tagging_loss=0.02682, over 4681554.00 frames. 2023-11-27 22:03:35,226 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-27 22:03:36,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3246360.0, ans=0.125 2023-11-27 22:03:52,354 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3246426.6666666665, ans=0.1 2023-11-27 22:04:01,604 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3246426.6666666665, ans=0.2 2023-11-27 22:04:05,240 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3246493.3333333335, ans=0.0 2023-11-27 22:04:09,504 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3246493.3333333335, ans=0.0 2023-11-27 22:04:25,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3246560.0, ans=0.0 2023-11-27 22:04:31,282 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 22:04:38,889 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 487000 2023-11-27 22:04:43,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3246626.6666666665, ans=0.125 2023-11-27 22:04:44,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3246626.6666666665, ans=0.025 2023-11-27 22:04:46,207 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3246693.3333333335, ans=0.1 2023-11-27 22:04:47,145 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 6050, loss[loss=0.06758, simple_loss=0.09007, pruned_loss=0.01296, audio_tagging_loss=0.009586, over 15758.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.09011, pruned_loss=0.01258, audio_tagging_loss=0.0087, over 3042544.68 frames. ], batch size: 60, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:04:52,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3246693.3333333335, ans=0.04949747468305833 2023-11-27 22:04:55,358 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3246693.3333333335, ans=0.0 2023-11-27 22:05:17,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3246826.6666666665, ans=0.0 2023-11-27 22:05:23,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3246826.6666666665, ans=0.0 2023-11-27 22:05:27,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3246893.3333333335, ans=0.0 2023-11-27 22:05:29,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3246893.3333333335, ans=0.1 2023-11-27 22:05:39,529 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3246893.3333333335, ans=0.125 2023-11-27 22:05:40,034 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.27 vs. limit=6.0 2023-11-27 22:05:47,406 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 487050 2023-11-27 22:05:48,481 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.460e+01 8.706e+01 9.274e+01 9.905e+01 1.388e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-27 22:05:56,298 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 6100, loss[loss=0.05772, simple_loss=0.07602, pruned_loss=0.0118, audio_tagging_loss=0.007911, over 15545.00 frames. ], tot_loss[loss=0.0667, simple_loss=0.09062, pruned_loss=0.01276, audio_tagging_loss=0.008634, over 3045133.40 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:06:09,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3247093.3333333335, ans=10.0 2023-11-27 22:06:24,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3247160.0, ans=0.0 2023-11-27 22:06:26,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3247160.0, ans=0.125 2023-11-27 22:06:38,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=3247226.6666666665, ans=0.05 2023-11-27 22:06:56,301 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 487100 2023-11-27 22:07:04,118 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 6150, loss[loss=0.07161, simple_loss=0.1035, pruned_loss=0.01424, audio_tagging_loss=0.005597, over 14483.00 frames. ], tot_loss[loss=0.0667, simple_loss=0.09023, pruned_loss=0.01282, audio_tagging_loss=0.008763, over 3045968.42 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:07:11,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3247360.0, ans=0.125 2023-11-27 22:07:16,420 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.32 vs. limit=6.0 2023-11-27 22:07:22,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3247426.6666666665, ans=0.0 2023-11-27 22:07:22,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3247426.6666666665, ans=0.1 2023-11-27 22:07:39,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3247493.3333333335, ans=0.125 2023-11-27 22:07:41,386 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3247493.3333333335, ans=0.2 2023-11-27 22:07:55,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3247560.0, ans=0.2 2023-11-27 22:08:04,447 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 487150 2023-11-27 22:08:05,509 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.463e+01 8.962e+01 9.637e+01 1.023e+02 1.658e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-27 22:08:11,805 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 6200, loss[loss=0.07635, simple_loss=0.1076, pruned_loss=0.0172, audio_tagging_loss=0.005368, over 14697.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.09029, pruned_loss=0.01269, audio_tagging_loss=0.008745, over 3041822.88 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:08:45,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3247826.6666666665, ans=0.125 2023-11-27 22:08:59,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3247893.3333333335, ans=0.0 2023-11-27 22:09:09,039 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3247960.0, ans=0.0 2023-11-27 22:09:09,957 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 487200 2023-11-27 22:09:17,732 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 6250, loss[loss=0.06505, simple_loss=0.07996, pruned_loss=0.01245, audio_tagging_loss=0.01262, over 14021.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.0897, pruned_loss=0.01265, audio_tagging_loss=0.008878, over 3041832.77 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:09:32,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3248093.3333333335, ans=0.04949747468305833 2023-11-27 22:09:37,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3248093.3333333335, ans=0.0 2023-11-27 22:09:38,429 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3248093.3333333335, ans=0.05 2023-11-27 22:09:47,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3248160.0, ans=0.125 2023-11-27 22:10:10,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3248293.3333333335, ans=0.125 2023-11-27 22:10:11,240 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.77 vs. limit=15.0 2023-11-27 22:10:15,271 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 487250 2023-11-27 22:10:17,269 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.157e+01 8.680e+01 9.045e+01 9.912e+01 1.334e+02, threshold=1.809e+02, percent-clipped=0.0 2023-11-27 22:10:22,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3248360.0, ans=0.125 2023-11-27 22:10:23,271 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 6300, loss[loss=0.07759, simple_loss=0.1101, pruned_loss=0.01486, audio_tagging_loss=0.007681, over 15151.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.08961, pruned_loss=0.01265, audio_tagging_loss=0.008972, over 3040066.41 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:10:45,724 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.22 vs. limit=15.0 2023-11-27 22:11:03,725 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.85 vs. limit=15.0 2023-11-27 22:11:07,443 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.67 vs. limit=10.0 2023-11-27 22:11:09,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3248560.0, ans=0.125 2023-11-27 22:11:20,389 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 487300 2023-11-27 22:11:27,391 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 6350, loss[loss=0.0604, simple_loss=0.0752, pruned_loss=0.01249, audio_tagging_loss=0.01031, over 15259.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.08989, pruned_loss=0.01259, audio_tagging_loss=0.008971, over 3045553.07 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:11:35,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3248693.3333333335, ans=0.05 2023-11-27 22:11:40,735 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3248760.0, ans=0.2 2023-11-27 22:11:48,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3248760.0, ans=0.125 2023-11-27 22:11:49,289 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3248760.0, ans=0.0 2023-11-27 22:11:49,413 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3248760.0, ans=0.125 2023-11-27 22:11:54,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3248826.6666666665, ans=0.0 2023-11-27 22:12:02,418 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3248826.6666666665, ans=0.125 2023-11-27 22:12:04,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3248893.3333333335, ans=0.125 2023-11-27 22:12:20,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3248960.0, ans=0.5 2023-11-27 22:12:23,534 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 487350 2023-11-27 22:12:24,075 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.99 vs. limit=15.0 2023-11-27 22:12:24,697 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.467e+01 8.655e+01 9.162e+01 9.797e+01 1.327e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-27 22:12:31,061 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 6400, loss[loss=0.06615, simple_loss=0.09685, pruned_loss=0.009414, audio_tagging_loss=0.008311, over 15197.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.08917, pruned_loss=0.01249, audio_tagging_loss=0.009067, over 3042252.11 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 22:12:40,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3249026.6666666665, ans=0.125 2023-11-27 22:13:11,196 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3249226.6666666665, ans=0.125 2023-11-27 22:13:11,200 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3249226.6666666665, ans=0.2 2023-11-27 22:13:22,488 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 22:13:28,911 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 487400 2023-11-27 22:13:36,646 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 6450, loss[loss=0.07887, simple_loss=0.1067, pruned_loss=0.01726, audio_tagging_loss=0.008275, over 14065.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.08952, pruned_loss=0.01262, audio_tagging_loss=0.009168, over 3035205.69 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 22:13:47,710 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3249360.0, ans=0.125 2023-11-27 22:13:55,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3249426.6666666665, ans=0.125 2023-11-27 22:13:58,753 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3249426.6666666665, ans=0.1 2023-11-27 22:14:22,365 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.63 vs. limit=15.0 2023-11-27 22:14:34,807 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 487450 2023-11-27 22:14:37,156 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.482e+01 8.688e+01 9.330e+01 9.887e+01 1.158e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-27 22:14:42,172 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 6500, loss[loss=0.05909, simple_loss=0.08274, pruned_loss=0.008317, audio_tagging_loss=0.009402, over 14621.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.08951, pruned_loss=0.0125, audio_tagging_loss=0.009134, over 3031184.78 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:14:57,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3249760.0, ans=0.125 2023-11-27 22:15:11,497 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.35 vs. limit=15.0 2023-11-27 22:15:17,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3249826.6666666665, ans=0.0 2023-11-27 22:15:23,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3249893.3333333335, ans=0.0 2023-11-27 22:15:38,431 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 487500 2023-11-27 22:15:45,965 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 6550, loss[loss=0.08333, simple_loss=0.1073, pruned_loss=0.0199, audio_tagging_loss=0.009769, over 14258.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08865, pruned_loss=0.01229, audio_tagging_loss=0.009012, over 3030541.71 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:16:22,361 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3250160.0, ans=0.125 2023-11-27 22:16:42,990 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 487550 2023-11-27 22:16:45,714 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.642e+01 8.718e+01 9.335e+01 9.836e+01 1.577e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-27 22:16:49,255 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.46 vs. limit=15.0 2023-11-27 22:16:51,228 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 6600, loss[loss=0.06933, simple_loss=0.111, pruned_loss=0.00909, audio_tagging_loss=0.004746, over 16107.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.08887, pruned_loss=0.01238, audio_tagging_loss=0.00884, over 3026338.93 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:17:15,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3250426.6666666665, ans=0.125 2023-11-27 22:17:16,912 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3250493.3333333335, ans=0.0 2023-11-27 22:17:22,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3250493.3333333335, ans=0.035 2023-11-27 22:17:36,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3250560.0, ans=0.1 2023-11-27 22:17:36,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3250560.0, ans=0.125 2023-11-27 22:17:36,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3250560.0, ans=0.125 2023-11-27 22:17:47,872 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 487600 2023-11-27 22:17:47,948 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3250626.6666666665, ans=0.125 2023-11-27 22:17:51,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3250626.6666666665, ans=0.2 2023-11-27 22:17:56,529 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 6650, loss[loss=0.06266, simple_loss=0.0791, pruned_loss=0.01241, audio_tagging_loss=0.01069, over 15485.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08943, pruned_loss=0.01227, audio_tagging_loss=0.008658, over 3030582.80 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:17:58,057 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3250693.3333333335, ans=0.125 2023-11-27 22:18:12,346 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3250760.0, ans=0.0 2023-11-27 22:18:49,408 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 22:18:52,971 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 487650 2023-11-27 22:18:55,290 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.323e+01 8.676e+01 9.213e+01 9.807e+01 1.195e+02, threshold=1.843e+02, percent-clipped=0.0 2023-11-27 22:19:00,120 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 6700, loss[loss=0.06169, simple_loss=0.08033, pruned_loss=0.01155, audio_tagging_loss=0.009972, over 15316.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.0889, pruned_loss=0.01225, audio_tagging_loss=0.008643, over 3029313.98 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:19:14,368 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3251093.3333333335, ans=0.125 2023-11-27 22:19:21,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3251093.3333333335, ans=0.125 2023-11-27 22:19:23,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3251093.3333333335, ans=0.0 2023-11-27 22:19:32,787 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3251160.0, ans=0.125 2023-11-27 22:19:39,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3251226.6666666665, ans=0.0 2023-11-27 22:19:45,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3251226.6666666665, ans=0.05 2023-11-27 22:19:52,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=3251293.3333333335, ans=0.95 2023-11-27 22:19:56,304 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 487700 2023-11-27 22:19:57,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3251293.3333333335, ans=0.125 2023-11-27 22:20:04,245 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 6750, loss[loss=0.08216, simple_loss=0.1157, pruned_loss=0.01425, audio_tagging_loss=0.01007, over 15133.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.08933, pruned_loss=0.01237, audio_tagging_loss=0.008686, over 3027444.75 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:20:12,363 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.07 vs. limit=15.0 2023-11-27 22:20:18,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3251426.6666666665, ans=0.1 2023-11-27 22:20:33,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3251493.3333333335, ans=0.1 2023-11-27 22:20:38,701 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3251493.3333333335, ans=0.1 2023-11-27 22:20:59,944 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 487750 2023-11-27 22:21:02,145 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.521e+01 8.663e+01 9.320e+01 9.783e+01 1.430e+02, threshold=1.864e+02, percent-clipped=0.0 2023-11-27 22:21:05,735 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3251626.6666666665, ans=0.125 2023-11-27 22:21:07,737 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 6800, loss[loss=0.06344, simple_loss=0.08042, pruned_loss=0.01341, audio_tagging_loss=0.009823, over 14832.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.08948, pruned_loss=0.01241, audio_tagging_loss=0.008751, over 3027696.62 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 22:21:27,483 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.04 vs. limit=6.0 2023-11-27 22:21:45,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3251893.3333333335, ans=0.0 2023-11-27 22:21:54,688 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.58 vs. limit=15.0 2023-11-27 22:22:03,488 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 487800 2023-11-27 22:22:11,469 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 6850, loss[loss=0.06351, simple_loss=0.08482, pruned_loss=0.01249, audio_tagging_loss=0.00861, over 15743.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.09003, pruned_loss=0.01259, audio_tagging_loss=0.00877, over 3032329.31 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 22:22:31,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3252093.3333333335, ans=0.1 2023-11-27 22:22:50,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3252226.6666666665, ans=0.125 2023-11-27 22:22:52,094 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.45 vs. limit=6.0 2023-11-27 22:22:54,370 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.82 vs. limit=22.5 2023-11-27 22:22:58,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3252226.6666666665, ans=0.0 2023-11-27 22:23:06,684 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 487850 2023-11-27 22:23:06,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3252293.3333333335, ans=0.2 2023-11-27 22:23:08,625 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.92 vs. limit=15.0 2023-11-27 22:23:10,108 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.119e+01 8.804e+01 9.259e+01 1.010e+02 1.279e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-27 22:23:14,173 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 6900, loss[loss=0.05279, simple_loss=0.07389, pruned_loss=0.007465, audio_tagging_loss=0.008384, over 14561.00 frames. ], tot_loss[loss=0.06538, simple_loss=0.08864, pruned_loss=0.01229, audio_tagging_loss=0.008772, over 3033568.40 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:23:19,768 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3252360.0, ans=0.0 2023-11-27 22:23:24,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3252360.0, ans=0.09899494936611666 2023-11-27 22:23:34,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3252426.6666666665, ans=0.0 2023-11-27 22:23:35,361 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3252426.6666666665, ans=0.125 2023-11-27 22:23:51,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3252560.0, ans=0.125 2023-11-27 22:23:56,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3252560.0, ans=0.0 2023-11-27 22:24:03,559 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 22:24:07,470 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 487900 2023-11-27 22:24:13,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3252693.3333333335, ans=0.125 2023-11-27 22:24:14,309 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 6950, loss[loss=0.07601, simple_loss=0.1123, pruned_loss=0.0136, audio_tagging_loss=0.006267, over 15871.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.08926, pruned_loss=0.01239, audio_tagging_loss=0.008707, over 3031397.35 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:24:20,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3252693.3333333335, ans=0.0 2023-11-27 22:24:27,029 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.18 vs. limit=12.0 2023-11-27 22:24:44,761 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.93 vs. limit=15.0 2023-11-27 22:24:51,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3252893.3333333335, ans=0.2 2023-11-27 22:25:06,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3252960.0, ans=0.05 2023-11-27 22:25:11,231 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 487950 2023-11-27 22:25:17,145 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.309e+01 8.695e+01 9.327e+01 1.020e+02 1.737e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-27 22:25:22,898 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 7000, loss[loss=0.0554, simple_loss=0.07514, pruned_loss=0.00836, audio_tagging_loss=0.009467, over 13478.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.08921, pruned_loss=0.01243, audio_tagging_loss=0.008754, over 3035372.93 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:26:43,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3253160.0, ans=0.09899494936611666 2023-11-27 22:27:02,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3253160.0, ans=0.125 2023-11-27 22:27:22,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3253160.0, ans=0.0 2023-11-27 22:27:23,052 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3253160.0, ans=0.125 2023-11-27 22:28:42,709 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 488000 2023-11-27 22:29:11,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3253360.0, ans=0.125 2023-11-27 22:29:11,401 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3253360.0, ans=0.1 2023-11-27 22:29:17,526 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 7050, loss[loss=0.04939, simple_loss=0.0679, pruned_loss=0.006974, audio_tagging_loss=0.008465, over 14521.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.08996, pruned_loss=0.0124, audio_tagging_loss=0.00871, over 3039521.42 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 8.0 2023-11-27 22:29:49,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3253360.0, ans=0.1 2023-11-27 22:30:01,110 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3253360.0, ans=0.125 2023-11-27 22:31:55,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3253560.0, ans=0.125 2023-11-27 22:32:00,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3253560.0, ans=0.125 2023-11-27 22:32:28,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3253626.6666666665, ans=0.125 2023-11-27 22:32:42,628 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 488050 2023-11-27 22:33:04,470 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.365e+01 8.458e+01 9.244e+01 1.037e+02 2.754e+02, threshold=1.849e+02, percent-clipped=1.0 2023-11-27 22:33:16,799 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 7100, loss[loss=0.07962, simple_loss=0.1083, pruned_loss=0.01697, audio_tagging_loss=0.008498, over 15111.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08922, pruned_loss=0.01226, audio_tagging_loss=0.008819, over 3044628.58 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 8.0 2023-11-27 22:33:37,478 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.09 vs. limit=15.0 2023-11-27 22:34:18,510 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.47 vs. limit=15.0 2023-11-27 22:34:42,687 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.86 vs. limit=15.0 2023-11-27 22:35:45,832 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3253893.3333333335, ans=0.125 2023-11-27 22:36:45,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3253960.0, ans=0.05 2023-11-27 22:36:48,269 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 488100 2023-11-27 22:37:13,780 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 7150, loss[loss=0.07179, simple_loss=0.08694, pruned_loss=0.01634, audio_tagging_loss=0.01198, over 14521.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.08952, pruned_loss=0.01238, audio_tagging_loss=0.008853, over 3037956.49 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 8.0 2023-11-27 22:37:56,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3254093.3333333335, ans=0.07 2023-11-27 22:38:06,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3254093.3333333335, ans=0.125 2023-11-27 22:38:06,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3254093.3333333335, ans=0.125 2023-11-27 22:39:11,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3254160.0, ans=0.0 2023-11-27 22:39:48,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3254226.6666666665, ans=0.0 2023-11-27 22:39:51,660 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3254226.6666666665, ans=0.125 2023-11-27 22:40:25,610 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3254293.3333333335, ans=0.0 2023-11-27 22:40:28,532 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 488150 2023-11-27 22:40:28,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3254293.3333333335, ans=0.2 2023-11-27 22:40:45,352 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.853e+01 8.864e+01 9.452e+01 1.007e+02 1.551e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-27 22:41:02,127 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 7200, loss[loss=0.055, simple_loss=0.06972, pruned_loss=0.009854, audio_tagging_loss=0.01028, over 14988.00 frames. ], tot_loss[loss=0.0666, simple_loss=0.09046, pruned_loss=0.01252, audio_tagging_loss=0.008842, over 3043173.10 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:41:26,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3254360.0, ans=0.1 2023-11-27 22:41:29,726 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3254360.0, ans=0.125 2023-11-27 22:41:53,981 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.38 vs. limit=6.0 2023-11-27 22:43:07,569 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3254560.0, ans=0.125 2023-11-27 22:43:43,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3254626.6666666665, ans=0.125 2023-11-27 22:43:46,628 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 488200 2023-11-27 22:44:10,503 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 7250, loss[loss=0.0873, simple_loss=0.1249, pruned_loss=0.01859, audio_tagging_loss=0.006271, over 16381.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.08967, pruned_loss=0.01249, audio_tagging_loss=0.008928, over 3045389.70 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:44:17,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3254693.3333333335, ans=0.0 2023-11-27 22:44:51,161 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3254760.0, ans=0.04949747468305833 2023-11-27 22:45:11,076 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 22:46:12,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3254960.0, ans=0.2 2023-11-27 22:46:15,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3254960.0, ans=0.1 2023-11-27 22:46:30,103 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 488250 2023-11-27 22:46:30,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3254960.0, ans=0.125 2023-11-27 22:46:42,001 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.327e+01 8.745e+01 9.249e+01 1.003e+02 1.162e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-27 22:46:47,146 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 7300, loss[loss=0.0608, simple_loss=0.09319, pruned_loss=0.008839, audio_tagging_loss=0.005366, over 14414.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.09031, pruned_loss=0.01254, audio_tagging_loss=0.008803, over 3050790.33 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:46:47,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3255026.6666666665, ans=0.0 2023-11-27 22:48:26,564 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3255226.6666666665, ans=0.1 2023-11-27 22:48:36,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3255226.6666666665, ans=0.125 2023-11-27 22:48:44,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3255226.6666666665, ans=0.0 2023-11-27 22:49:05,527 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 488300 2023-11-27 22:49:10,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3255293.3333333335, ans=0.125 2023-11-27 22:49:21,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3255360.0, ans=0.125 2023-11-27 22:49:24,744 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 7350, loss[loss=0.06166, simple_loss=0.09166, pruned_loss=0.01075, audio_tagging_loss=0.005085, over 13849.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.08992, pruned_loss=0.01261, audio_tagging_loss=0.008651, over 3051467.72 frames. ], batch size: 52, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:51:51,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3255626.6666666665, ans=0.5 2023-11-27 22:52:03,421 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 488350 2023-11-27 22:52:13,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3255626.6666666665, ans=0.125 2023-11-27 22:52:15,295 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.244e+01 8.717e+01 9.286e+01 1.027e+02 1.219e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-27 22:52:20,683 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 7400, loss[loss=0.06129, simple_loss=0.08084, pruned_loss=0.01091, audio_tagging_loss=0.009954, over 15650.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.08923, pruned_loss=0.01251, audio_tagging_loss=0.008736, over 3049229.48 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:52:34,559 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3255693.3333333335, ans=0.125 2023-11-27 22:52:50,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3255693.3333333335, ans=0.125 2023-11-27 22:52:56,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3255760.0, ans=0.125 2023-11-27 22:53:22,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3255760.0, ans=0.2 2023-11-27 22:53:37,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3255826.6666666665, ans=0.125 2023-11-27 22:54:08,291 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3255893.3333333335, ans=0.125 2023-11-27 22:54:36,335 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.90 vs. limit=15.0 2023-11-27 22:54:59,945 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 488400 2023-11-27 22:55:21,632 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 7450, loss[loss=0.05749, simple_loss=0.08168, pruned_loss=0.008781, audio_tagging_loss=0.007875, over 16282.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08845, pruned_loss=0.01239, audio_tagging_loss=0.008811, over 3045514.48 frames. ], batch size: 61, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:56:30,348 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 22:57:24,962 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.51 vs. limit=22.5 2023-11-27 22:57:48,495 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 488450 2023-11-27 22:57:59,976 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.351e+01 8.653e+01 9.263e+01 9.964e+01 1.295e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-27 22:58:07,687 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 7500, loss[loss=0.05409, simple_loss=0.07184, pruned_loss=0.008999, audio_tagging_loss=0.009168, over 15589.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.08914, pruned_loss=0.01249, audio_tagging_loss=0.0088, over 3049061.09 frames. ], batch size: 60, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 23:00:42,976 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 488500 2023-11-27 23:00:50,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3256626.6666666665, ans=0.1 2023-11-27 23:00:57,363 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.32 vs. limit=15.0 2023-11-27 23:01:02,845 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 7550, loss[loss=0.07406, simple_loss=0.1095, pruned_loss=0.01386, audio_tagging_loss=0.005432, over 15808.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.08959, pruned_loss=0.01258, audio_tagging_loss=0.00877, over 3053174.38 frames. ], batch size: 60, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 23:02:13,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3256826.6666666665, ans=0.5 2023-11-27 23:03:30,450 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 488550 2023-11-27 23:03:43,860 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.731e+01 8.710e+01 9.273e+01 1.023e+02 1.229e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-27 23:03:47,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3257026.6666666665, ans=0.125 2023-11-27 23:03:49,510 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 7600, loss[loss=0.07562, simple_loss=0.1114, pruned_loss=0.01476, audio_tagging_loss=0.005164, over 16469.00 frames. ], tot_loss[loss=0.06602, simple_loss=0.08914, pruned_loss=0.01258, audio_tagging_loss=0.008868, over 3060288.18 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 23:03:59,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3257026.6666666665, ans=0.125 2023-11-27 23:04:16,298 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.44 vs. limit=15.0 2023-11-27 23:04:18,913 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.22 vs. limit=15.0 2023-11-27 23:04:42,330 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3257093.3333333335, ans=0.0 2023-11-27 23:04:58,192 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 23:05:41,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3257226.6666666665, ans=0.04949747468305833 2023-11-27 23:06:02,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3257293.3333333335, ans=0.1 2023-11-27 23:06:07,106 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 488600 2023-11-27 23:06:25,667 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.88 vs. limit=15.0 2023-11-27 23:06:26,642 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 7650, loss[loss=0.06839, simple_loss=0.09063, pruned_loss=0.01509, audio_tagging_loss=0.007978, over 15077.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.0886, pruned_loss=0.01227, audio_tagging_loss=0.008828, over 3061177.80 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 23:06:30,349 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3257360.0, ans=0.05 2023-11-27 23:06:58,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3257426.6666666665, ans=0.0 2023-11-27 23:07:27,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3257493.3333333335, ans=0.125 2023-11-27 23:07:34,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3257493.3333333335, ans=0.05 2023-11-27 23:08:16,085 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3257560.0, ans=0.125 2023-11-27 23:08:36,944 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 488650 2023-11-27 23:08:50,834 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.558e+01 8.807e+01 9.447e+01 1.017e+02 1.729e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-27 23:08:53,445 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 7700, loss[loss=0.08096, simple_loss=0.115, pruned_loss=0.01756, audio_tagging_loss=0.005924, over 15607.00 frames. ], tot_loss[loss=0.06596, simple_loss=0.08968, pruned_loss=0.01244, audio_tagging_loss=0.008678, over 3058850.97 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 23:09:50,705 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.03 vs. limit=10.0 2023-11-27 23:09:55,846 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=10.81 vs. limit=15.0 2023-11-27 23:10:52,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3257960.0, ans=0.125 2023-11-27 23:10:55,056 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 488700 2023-11-27 23:11:07,615 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-27 23:11:16,954 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 7750, loss[loss=0.05275, simple_loss=0.07252, pruned_loss=0.008394, audio_tagging_loss=0.008095, over 14540.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.09015, pruned_loss=0.01261, audio_tagging_loss=0.008697, over 3061166.83 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 23:11:35,945 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.62 vs. limit=15.0 2023-11-27 23:11:51,975 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.51 vs. limit=12.0 2023-11-27 23:12:36,504 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.68 vs. limit=15.0 2023-11-27 23:12:54,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3258226.6666666665, ans=0.1 2023-11-27 23:13:42,755 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 488750 2023-11-27 23:13:54,636 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.09 vs. limit=12.0 2023-11-27 23:13:55,222 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.544e+01 8.828e+01 9.509e+01 1.004e+02 1.323e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-27 23:13:58,049 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 7800, loss[loss=0.05413, simple_loss=0.06627, pruned_loss=0.00952, audio_tagging_loss=0.01147, over 15598.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.0906, pruned_loss=0.01256, audio_tagging_loss=0.008679, over 3057327.94 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 23:14:34,163 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.31 vs. limit=15.0 2023-11-27 23:14:37,646 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3258426.6666666665, ans=0.0 2023-11-27 23:15:25,271 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.81 vs. limit=10.0 2023-11-27 23:15:45,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3258626.6666666665, ans=0.125 2023-11-27 23:15:51,058 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 488800 2023-11-27 23:15:57,429 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 23:16:08,516 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 7850, loss[loss=0.05123, simple_loss=0.0594, pruned_loss=0.009919, audio_tagging_loss=0.01161, over 15013.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.08985, pruned_loss=0.01248, audio_tagging_loss=0.00878, over 3046590.89 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 23:16:40,882 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3258760.0, ans=0.2 2023-11-27 23:16:46,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3258760.0, ans=0.125 2023-11-27 23:17:06,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3258826.6666666665, ans=0.125 2023-11-27 23:17:47,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3258960.0, ans=0.1 2023-11-27 23:17:52,026 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 488850 2023-11-27 23:17:52,113 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3258960.0, ans=0.1 2023-11-27 23:18:02,662 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.284e+01 8.704e+01 9.225e+01 1.001e+02 1.986e+02, threshold=1.845e+02, percent-clipped=1.0 2023-11-27 23:18:06,396 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 7900, loss[loss=0.04688, simple_loss=0.05543, pruned_loss=0.007909, audio_tagging_loss=0.01126, over 14344.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.08902, pruned_loss=0.01227, audio_tagging_loss=0.008948, over 3050254.58 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 23:18:25,775 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.14 vs. limit=22.5 2023-11-27 23:18:27,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3259093.3333333335, ans=0.125 2023-11-27 23:18:29,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3259093.3333333335, ans=0.1 2023-11-27 23:19:13,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3259160.0, ans=10.0 2023-11-27 23:19:20,185 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3259226.6666666665, ans=0.0 2023-11-27 23:19:38,455 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.92 vs. limit=15.0 2023-11-27 23:19:48,486 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 488900 2023-11-27 23:20:01,583 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 7950, loss[loss=0.0579, simple_loss=0.06642, pruned_loss=0.01377, audio_tagging_loss=0.01091, over 13783.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08888, pruned_loss=0.01228, audio_tagging_loss=0.009057, over 3056337.20 frames. ], batch size: 53, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 23:20:15,221 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.01 vs. limit=12.0 2023-11-27 23:20:27,856 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.29 vs. limit=10.0 2023-11-27 23:20:30,366 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 23:20:55,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3259493.3333333335, ans=0.125 2023-11-27 23:20:57,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3259493.3333333335, ans=0.0 2023-11-27 23:21:29,027 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 488950 2023-11-27 23:21:37,666 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.047e+01 8.632e+01 9.434e+01 1.008e+02 1.251e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-27 23:21:39,878 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 8000, loss[loss=0.07977, simple_loss=0.109, pruned_loss=0.01679, audio_tagging_loss=0.008477, over 15514.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.08889, pruned_loss=0.01232, audio_tagging_loss=0.009061, over 3049400.46 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 23:22:10,565 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.82 vs. limit=10.0 2023-11-27 23:22:17,500 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3259826.6666666665, ans=0.125 2023-11-27 23:22:31,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3259826.6666666665, ans=0.125 2023-11-27 23:22:32,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3259826.6666666665, ans=0.125 2023-11-27 23:22:53,348 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3259893.3333333335, ans=0.125 2023-11-27 23:22:55,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3259893.3333333335, ans=0.0 2023-11-27 23:23:02,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3259960.0, ans=0.1 2023-11-27 23:23:06,111 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 489000 2023-11-27 23:23:06,212 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3259960.0, ans=0.125 2023-11-27 23:23:17,662 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 8050, loss[loss=0.08093, simple_loss=0.106, pruned_loss=0.0192, audio_tagging_loss=0.008717, over 15149.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.08991, pruned_loss=0.01252, audio_tagging_loss=0.009035, over 3054808.97 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 23:24:21,663 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.52 vs. limit=15.0 2023-11-27 23:24:23,893 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3260226.6666666665, ans=0.1 2023-11-27 23:24:24,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3260226.6666666665, ans=0.125 2023-11-27 23:24:39,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3260293.3333333335, ans=0.125 2023-11-27 23:24:40,477 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 489050 2023-11-27 23:24:48,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3260293.3333333335, ans=0.0 2023-11-27 23:24:49,270 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.416e+01 8.685e+01 9.405e+01 9.974e+01 1.162e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-27 23:24:51,281 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 8100, loss[loss=0.0542, simple_loss=0.07114, pruned_loss=0.008912, audio_tagging_loss=0.009716, over 14698.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.09014, pruned_loss=0.01259, audio_tagging_loss=0.008999, over 3048582.46 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 23:25:08,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3260426.6666666665, ans=0.125 2023-11-27 23:25:24,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3260426.6666666665, ans=0.125 2023-11-27 23:25:35,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3260493.3333333335, ans=0.0 2023-11-27 23:25:37,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3260493.3333333335, ans=0.125 2023-11-27 23:25:52,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3260560.0, ans=0.125 2023-11-27 23:26:13,672 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 489100 2023-11-27 23:26:23,365 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.12 vs. limit=22.5 2023-11-27 23:26:24,238 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 8150, loss[loss=0.0562, simple_loss=0.08278, pruned_loss=0.008519, audio_tagging_loss=0.006295, over 16292.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.09026, pruned_loss=0.01259, audio_tagging_loss=0.008858, over 3049064.85 frames. ], batch size: 60, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 23:26:39,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3260693.3333333335, ans=0.0 2023-11-27 23:27:20,688 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3260893.3333333335, ans=0.2 2023-11-27 23:27:43,082 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 489150 2023-11-27 23:27:50,615 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.384e+01 8.874e+01 9.379e+01 1.019e+02 1.298e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-27 23:27:52,127 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 8200, loss[loss=0.08197, simple_loss=0.1183, pruned_loss=0.01545, audio_tagging_loss=0.007363, over 16010.00 frames. ], tot_loss[loss=0.06673, simple_loss=0.0908, pruned_loss=0.0126, audio_tagging_loss=0.008733, over 3048248.75 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 23:27:56,825 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 23:29:04,203 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 489200 2023-11-27 23:29:04,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3261293.3333333335, ans=0.0 2023-11-27 23:29:13,166 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 8250, loss[loss=0.0448, simple_loss=0.06166, pruned_loss=0.005194, audio_tagging_loss=0.008773, over 14192.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.0895, pruned_loss=0.01237, audio_tagging_loss=0.008807, over 3045867.42 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 32.0 2023-11-27 23:29:19,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3261360.0, ans=0.125 2023-11-27 23:29:26,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3261426.6666666665, ans=0.125 2023-11-27 23:29:37,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3261426.6666666665, ans=0.125 2023-11-27 23:30:06,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3261560.0, ans=0.1 2023-11-27 23:30:18,193 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 489250 2023-11-27 23:30:22,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3261626.6666666665, ans=0.125 2023-11-27 23:30:27,238 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.418e+01 8.905e+01 9.510e+01 1.029e+02 2.089e+02, threshold=1.902e+02, percent-clipped=1.0 2023-11-27 23:30:27,270 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 8300, loss[loss=0.0861, simple_loss=0.1175, pruned_loss=0.01893, audio_tagging_loss=0.008432, over 17639.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.089, pruned_loss=0.01246, audio_tagging_loss=0.008903, over 3051501.18 frames. ], batch size: 67, lr: 1.64e-03, grad_scale: 16.0 2023-11-27 23:30:35,958 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3261693.3333333335, ans=0.125 2023-11-27 23:30:57,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=3261826.6666666665, ans=10.0 2023-11-27 23:30:58,086 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3261826.6666666665, ans=0.125 2023-11-27 23:31:02,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3261826.6666666665, ans=0.09899494936611666 2023-11-27 23:31:27,174 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 489300 2023-11-27 23:31:35,085 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 8350, loss[loss=0.07046, simple_loss=0.09797, pruned_loss=0.01534, audio_tagging_loss=0.006139, over 14089.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.08937, pruned_loss=0.01254, audio_tagging_loss=0.008819, over 3054244.27 frames. ], batch size: 54, lr: 1.64e-03, grad_scale: 16.0 2023-11-27 23:32:33,433 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 489350 2023-11-27 23:32:46,825 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.332e+01 8.783e+01 9.379e+01 1.006e+02 1.235e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-27 23:32:46,991 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 8400, loss[loss=0.06944, simple_loss=0.09556, pruned_loss=0.01293, audio_tagging_loss=0.008724, over 16034.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.08921, pruned_loss=0.01239, audio_tagging_loss=0.008831, over 3047624.23 frames. ], batch size: 60, lr: 1.64e-03, grad_scale: 32.0 2023-11-27 23:32:51,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3262360.0, ans=0.1 2023-11-27 23:32:56,746 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3262360.0, ans=0.125 2023-11-27 23:33:29,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3262426.6666666665, ans=0.125 2023-11-27 23:34:27,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3262493.3333333335, ans=0.1 2023-11-27 23:35:42,816 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3262560.0, ans=0.2 2023-11-27 23:36:19,259 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3262626.6666666665, ans=0.125 2023-11-27 23:36:28,310 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 489400 2023-11-27 23:36:59,273 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 8450, loss[loss=0.05262, simple_loss=0.06625, pruned_loss=0.008417, audio_tagging_loss=0.01108, over 15018.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.08929, pruned_loss=0.01238, audio_tagging_loss=0.008881, over 3052203.57 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 32.0 2023-11-27 23:37:16,688 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3262693.3333333335, ans=0.125 2023-11-27 23:37:51,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3262760.0, ans=0.0 2023-11-27 23:39:49,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3262893.3333333335, ans=0.0 2023-11-27 23:39:52,885 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.71 vs. limit=8.0 2023-11-27 23:40:19,514 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 489450 2023-11-27 23:40:33,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3262960.0, ans=0.125 2023-11-27 23:40:53,218 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.793e+01 8.777e+01 9.452e+01 1.015e+02 1.471e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-27 23:40:53,323 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 8500, loss[loss=0.05454, simple_loss=0.07558, pruned_loss=0.007532, audio_tagging_loss=0.00922, over 14738.00 frames. ], tot_loss[loss=0.06602, simple_loss=0.08938, pruned_loss=0.01246, audio_tagging_loss=0.008874, over 3053416.27 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 32.0 2023-11-27 23:43:44,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3263226.6666666665, ans=0.125 2023-11-27 23:43:52,309 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.79 vs. limit=22.5 2023-11-27 23:44:18,400 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 489500 2023-11-27 23:44:29,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3263293.3333333335, ans=0.1 2023-11-27 23:44:43,007 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 8550, loss[loss=0.05867, simple_loss=0.08139, pruned_loss=0.009008, audio_tagging_loss=0.008967, over 15592.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.08943, pruned_loss=0.01255, audio_tagging_loss=0.008931, over 3058233.98 frames. ], batch size: 61, lr: 1.64e-03, grad_scale: 32.0 2023-11-27 23:45:18,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3263426.6666666665, ans=0.125 2023-11-27 23:46:29,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3263626.6666666665, ans=0.125 2023-11-27 23:46:31,888 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.54 vs. limit=22.5 2023-11-27 23:46:36,250 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 489550 2023-11-27 23:46:50,734 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.083e+01 8.830e+01 9.577e+01 1.042e+02 1.217e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-27 23:46:50,877 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 8600, loss[loss=0.06608, simple_loss=0.09482, pruned_loss=0.01088, audio_tagging_loss=0.007789, over 14792.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.08997, pruned_loss=0.0125, audio_tagging_loss=0.008796, over 3060642.47 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 32.0 2023-11-27 23:47:40,360 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.31 vs. limit=15.0 2023-11-27 23:47:51,915 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3263826.6666666665, ans=0.035 2023-11-27 23:48:34,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3263960.0, ans=0.1 2023-11-27 23:48:39,232 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 489600 2023-11-27 23:48:39,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3263960.0, ans=0.125 2023-11-27 23:48:45,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3263960.0, ans=0.0 2023-11-27 23:48:47,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3263960.0, ans=0.125 2023-11-27 23:48:54,568 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 8650, loss[loss=0.08696, simple_loss=0.1155, pruned_loss=0.02158, audio_tagging_loss=0.007608, over 16606.00 frames. ], tot_loss[loss=0.06697, simple_loss=0.09102, pruned_loss=0.01269, audio_tagging_loss=0.00877, over 3054013.83 frames. ], batch size: 59, lr: 1.64e-03, grad_scale: 32.0 2023-11-27 23:49:19,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3264093.3333333335, ans=0.0 2023-11-27 23:49:42,520 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.24 vs. limit=15.0 2023-11-27 23:49:48,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3264160.0, ans=0.125 2023-11-27 23:49:54,933 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.83 vs. limit=22.5 2023-11-27 23:50:07,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3264226.6666666665, ans=0.125 2023-11-27 23:50:37,104 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3264293.3333333335, ans=0.0 2023-11-27 23:50:44,356 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 489650 2023-11-27 23:50:58,421 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.714e+01 8.922e+01 9.759e+01 1.039e+02 1.261e+02, threshold=1.952e+02, percent-clipped=0.0 2023-11-27 23:50:58,577 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 8700, loss[loss=0.05734, simple_loss=0.07259, pruned_loss=0.009672, audio_tagging_loss=0.01137, over 14731.00 frames. ], tot_loss[loss=0.06712, simple_loss=0.09113, pruned_loss=0.01275, audio_tagging_loss=0.008812, over 3051293.53 frames. ], batch size: 54, lr: 1.64e-03, grad_scale: 32.0 2023-11-27 23:50:59,151 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3264360.0, ans=0.1 2023-11-27 23:51:45,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3264493.3333333335, ans=0.0 2023-11-27 23:52:03,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3264493.3333333335, ans=0.2 2023-11-27 23:52:17,628 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3264560.0, ans=0.07 2023-11-27 23:52:21,474 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.09 vs. limit=12.0 2023-11-27 23:52:47,534 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 489700 2023-11-27 23:52:52,088 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3264626.6666666665, ans=0.125 2023-11-27 23:53:01,364 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 8750, loss[loss=0.0608, simple_loss=0.08307, pruned_loss=0.01153, audio_tagging_loss=0.00774, over 15793.00 frames. ], tot_loss[loss=0.06735, simple_loss=0.0916, pruned_loss=0.01275, audio_tagging_loss=0.008802, over 3052282.62 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 32.0 2023-11-27 23:54:40,669 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.89 vs. limit=22.5 2023-11-27 23:54:51,426 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 489750 2023-11-27 23:54:51,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3264960.0, ans=0.0 2023-11-27 23:54:52,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3264960.0, ans=0.2 2023-11-27 23:55:06,012 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.575e+01 8.769e+01 9.393e+01 1.008e+02 1.168e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-27 23:55:06,088 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 8800, loss[loss=0.05915, simple_loss=0.08626, pruned_loss=0.007838, audio_tagging_loss=0.008186, over 14689.00 frames. ], tot_loss[loss=0.06761, simple_loss=0.09188, pruned_loss=0.0128, audio_tagging_loss=0.008877, over 3062884.32 frames. ], batch size: 53, lr: 1.64e-03, grad_scale: 32.0 2023-11-27 23:55:11,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3265026.6666666665, ans=0.125 2023-11-27 23:55:23,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3265026.6666666665, ans=0.0 2023-11-27 23:56:33,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3265226.6666666665, ans=0.125 2023-11-27 23:56:33,956 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.98 vs. limit=22.5 2023-11-27 23:56:52,349 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 489800 2023-11-27 23:57:07,667 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 8850, loss[loss=0.07467, simple_loss=0.1002, pruned_loss=0.01525, audio_tagging_loss=0.00933, over 14811.00 frames. ], tot_loss[loss=0.06753, simple_loss=0.0915, pruned_loss=0.01283, audio_tagging_loss=0.008944, over 3053979.04 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 16.0 2023-11-27 23:57:24,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3265360.0, ans=0.0 2023-11-27 23:57:35,181 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 23:57:37,943 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 23:57:41,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3265426.6666666665, ans=0.0 2023-11-27 23:58:13,194 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.84 vs. limit=6.0 2023-11-27 23:58:32,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3265560.0, ans=0.125 2023-11-27 23:58:43,913 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3265626.6666666665, ans=0.0 2023-11-27 23:58:47,263 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3265626.6666666665, ans=0.2 2023-11-27 23:58:50,908 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 489850 2023-11-27 23:58:59,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3265626.6666666665, ans=10.0 2023-11-27 23:59:03,397 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 8900, loss[loss=0.07269, simple_loss=0.102, pruned_loss=0.01225, audio_tagging_loss=0.009465, over 16029.00 frames. ], tot_loss[loss=0.06756, simple_loss=0.09176, pruned_loss=0.01283, audio_tagging_loss=0.008845, over 3054922.30 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 16.0 2023-11-27 23:59:05,823 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.165e+01 8.603e+01 9.158e+01 9.792e+01 1.158e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-27 23:59:12,991 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.68 vs. limit=15.0 2023-11-27 23:59:27,796 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.86 vs. limit=15.0 2023-11-27 23:59:53,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3265826.6666666665, ans=0.1 2023-11-28 00:00:12,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3265826.6666666665, ans=0.1 2023-11-28 00:00:30,097 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3265893.3333333335, ans=0.125 2023-11-28 00:00:40,051 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3265893.3333333335, ans=0.125 2023-11-28 00:00:44,782 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3265960.0, ans=0.2 2023-11-28 00:00:56,540 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 489900 2023-11-28 00:01:03,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3265960.0, ans=0.125 2023-11-28 00:01:10,624 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 8950, loss[loss=0.05922, simple_loss=0.08598, pruned_loss=0.0108, audio_tagging_loss=0.005433, over 16904.00 frames. ], tot_loss[loss=0.06684, simple_loss=0.09101, pruned_loss=0.0127, audio_tagging_loss=0.008632, over 3048799.29 frames. ], batch size: 64, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:01:12,009 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.12 vs. limit=22.5 2023-11-28 00:01:32,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten.whitening_limit, batch_count=3266026.6666666665, ans=22.5 2023-11-28 00:01:33,906 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 00:01:52,688 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.00 vs. limit=12.0 2023-11-28 00:02:21,219 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.82 vs. limit=15.0 2023-11-28 00:02:22,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3266226.6666666665, ans=0.05 2023-11-28 00:02:46,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3266293.3333333335, ans=0.07 2023-11-28 00:02:56,959 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 489950 2023-11-28 00:02:57,151 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3266293.3333333335, ans=10.0 2023-11-28 00:03:10,806 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 9000, loss[loss=0.07761, simple_loss=0.1085, pruned_loss=0.01676, audio_tagging_loss=0.006608, over 15832.00 frames. ], tot_loss[loss=0.06757, simple_loss=0.09226, pruned_loss=0.01289, audio_tagging_loss=0.008557, over 3059195.99 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:03:10,807 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-28 00:03:35,769 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.1442, 3.6953, 4.1003, 3.7413], device='cuda:1') 2023-11-28 00:04:14,785 INFO [train_asr.py:1267] (1/4) Epoch 41, validation: loss=0.05835, simple_loss=0.05061, pruned_loss=0.005195, audio_tagging_loss=0.02785, over 4681554.00 frames. 2023-11-28 00:04:14,787 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-28 00:04:16,761 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.302e+01 8.840e+01 9.454e+01 9.905e+01 1.337e+02, threshold=1.891e+02, percent-clipped=0.0 2023-11-28 00:04:21,993 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3266360.0, ans=0.125 2023-11-28 00:04:41,003 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.51 vs. limit=15.0 2023-11-28 00:04:44,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3266426.6666666665, ans=0.1 2023-11-28 00:05:36,553 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.94 vs. limit=15.0 2023-11-28 00:05:44,707 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.88 vs. limit=10.0 2023-11-28 00:06:02,586 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 490000 2023-11-28 00:06:17,076 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.43 vs. limit=10.0 2023-11-28 00:06:17,948 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 9050, loss[loss=0.0719, simple_loss=0.1024, pruned_loss=0.01103, audio_tagging_loss=0.009657, over 15614.00 frames. ], tot_loss[loss=0.06699, simple_loss=0.0914, pruned_loss=0.01276, audio_tagging_loss=0.00853, over 3057245.29 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:06:41,727 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.83 vs. limit=22.5 2023-11-28 00:07:31,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3266893.3333333335, ans=0.07 2023-11-28 00:07:40,561 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3266893.3333333335, ans=0.0 2023-11-28 00:08:05,076 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 490050 2023-11-28 00:08:19,745 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 9100, loss[loss=0.0692, simple_loss=0.09636, pruned_loss=0.0136, audio_tagging_loss=0.007416, over 14052.00 frames. ], tot_loss[loss=0.06734, simple_loss=0.09212, pruned_loss=0.01281, audio_tagging_loss=0.008477, over 3053242.91 frames. ], batch size: 53, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:08:22,060 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.608e+01 8.819e+01 9.395e+01 1.013e+02 1.222e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-28 00:08:31,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3267026.6666666665, ans=0.125 2023-11-28 00:08:59,925 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.29 vs. limit=15.0 2023-11-28 00:09:01,578 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3267093.3333333335, ans=0.0 2023-11-28 00:10:04,398 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 490100 2023-11-28 00:10:15,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3267360.0, ans=0.2 2023-11-28 00:10:17,968 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 9150, loss[loss=0.05844, simple_loss=0.08056, pruned_loss=0.0104, audio_tagging_loss=0.007757, over 15287.00 frames. ], tot_loss[loss=0.06742, simple_loss=0.09218, pruned_loss=0.01283, audio_tagging_loss=0.008499, over 3055073.89 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:10:57,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3267426.6666666665, ans=0.125 2023-11-28 00:10:59,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3267493.3333333335, ans=0.125 2023-11-28 00:11:57,751 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 490150 2023-11-28 00:12:08,879 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 9200, loss[loss=0.07497, simple_loss=0.1082, pruned_loss=0.01414, audio_tagging_loss=0.006724, over 14940.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.09078, pruned_loss=0.0127, audio_tagging_loss=0.008594, over 3049530.62 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 32.0 2023-11-28 00:12:11,677 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.387e+01 8.944e+01 9.391e+01 1.026e+02 1.333e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-28 00:12:28,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3267693.3333333335, ans=0.0 2023-11-28 00:12:34,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3267760.0, ans=0.0 2023-11-28 00:12:50,309 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.15 vs. limit=6.0 2023-11-28 00:13:56,372 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 490200 2023-11-28 00:14:09,182 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.59 vs. limit=15.0 2023-11-28 00:14:13,653 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 9250, loss[loss=0.05072, simple_loss=0.06826, pruned_loss=0.008987, audio_tagging_loss=0.007604, over 14870.00 frames. ], tot_loss[loss=0.06648, simple_loss=0.0904, pruned_loss=0.01269, audio_tagging_loss=0.008594, over 3052214.80 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:14:34,760 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.19 vs. limit=15.0 2023-11-28 00:15:16,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3268160.0, ans=0.0 2023-11-28 00:15:37,215 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3268226.6666666665, ans=0.125 2023-11-28 00:16:08,058 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.14 vs. limit=12.0 2023-11-28 00:16:09,041 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 490250 2023-11-28 00:16:23,536 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 9300, loss[loss=0.08648, simple_loss=0.1144, pruned_loss=0.01864, audio_tagging_loss=0.01063, over 15900.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.08962, pruned_loss=0.0124, audio_tagging_loss=0.008697, over 3053969.76 frames. ], batch size: 59, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:16:27,358 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.380e+01 8.477e+01 9.136e+01 9.623e+01 1.227e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-28 00:17:01,315 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 00:17:23,866 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3268493.3333333335, ans=0.125 2023-11-28 00:17:43,594 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3268560.0, ans=0.09899494936611666 2023-11-28 00:17:43,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3268560.0, ans=0.0 2023-11-28 00:18:11,014 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 490300 2023-11-28 00:18:23,594 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 9350, loss[loss=0.04445, simple_loss=0.05967, pruned_loss=0.004127, audio_tagging_loss=0.01049, over 16445.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.08926, pruned_loss=0.01238, audio_tagging_loss=0.008775, over 3057625.52 frames. ], batch size: 63, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:18:48,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3268760.0, ans=0.125 2023-11-28 00:19:11,849 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.65 vs. limit=10.0 2023-11-28 00:19:17,556 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3268826.6666666665, ans=0.1 2023-11-28 00:19:19,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3268826.6666666665, ans=0.0 2023-11-28 00:19:32,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3268893.3333333335, ans=0.2 2023-11-28 00:19:49,473 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.42 vs. limit=10.0 2023-11-28 00:20:01,517 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 490350 2023-11-28 00:20:14,377 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 9400, loss[loss=0.06559, simple_loss=0.08279, pruned_loss=0.01542, audio_tagging_loss=0.008775, over 16542.00 frames. ], tot_loss[loss=0.066, simple_loss=0.0895, pruned_loss=0.01246, audio_tagging_loss=0.008787, over 3057350.41 frames. ], batch size: 63, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:20:18,722 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.212e+01 8.645e+01 9.230e+01 9.959e+01 1.190e+02, threshold=1.846e+02, percent-clipped=0.0 2023-11-28 00:20:29,488 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3269026.6666666665, ans=0.0 2023-11-28 00:20:49,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3269093.3333333335, ans=0.125 2023-11-28 00:20:52,433 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3269093.3333333335, ans=0.0 2023-11-28 00:21:26,164 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.03 vs. limit=15.0 2023-11-28 00:21:53,846 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 490400 2023-11-28 00:22:02,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3269293.3333333335, ans=0.07 2023-11-28 00:22:05,766 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 9450, loss[loss=0.06758, simple_loss=0.09798, pruned_loss=0.01204, audio_tagging_loss=0.006545, over 15729.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.08993, pruned_loss=0.01236, audio_tagging_loss=0.008891, over 3060919.93 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 00:22:05,879 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 00:22:21,039 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 00:23:39,577 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 00:23:46,003 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 490450 2023-11-28 00:23:58,318 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 9500, loss[loss=0.06671, simple_loss=0.08842, pruned_loss=0.01055, audio_tagging_loss=0.01195, over 14843.00 frames. ], tot_loss[loss=0.06672, simple_loss=0.09079, pruned_loss=0.01245, audio_tagging_loss=0.008876, over 3053746.57 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 00:24:00,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3269693.3333333335, ans=0.0 2023-11-28 00:24:01,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3269693.3333333335, ans=0.2 2023-11-28 00:24:04,018 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.984e+01 8.586e+01 9.559e+01 1.044e+02 1.238e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-28 00:24:23,171 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.37 vs. limit=15.0 2023-11-28 00:24:35,601 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3269760.0, ans=0.1 2023-11-28 00:25:18,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3269960.0, ans=0.125 2023-11-28 00:25:25,240 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 490500 2023-11-28 00:25:29,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3269960.0, ans=0.0 2023-11-28 00:25:35,661 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 9550, loss[loss=0.087, simple_loss=0.1078, pruned_loss=0.0232, audio_tagging_loss=0.009903, over 14856.00 frames. ], tot_loss[loss=0.06736, simple_loss=0.09151, pruned_loss=0.01264, audio_tagging_loss=0.008966, over 3045935.60 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 00:25:40,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3270026.6666666665, ans=0.125 2023-11-28 00:25:54,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3270093.3333333335, ans=0.09899494936611666 2023-11-28 00:26:06,645 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.06 vs. limit=15.0 2023-11-28 00:26:21,589 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.65 vs. limit=12.0 2023-11-28 00:26:22,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3270160.0, ans=0.2 2023-11-28 00:26:28,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3270226.6666666665, ans=0.125 2023-11-28 00:26:30,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3270226.6666666665, ans=0.125 2023-11-28 00:26:37,451 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.81 vs. limit=15.0 2023-11-28 00:26:49,674 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 490550 2023-11-28 00:26:58,289 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 9600, loss[loss=0.07089, simple_loss=0.1032, pruned_loss=0.01149, audio_tagging_loss=0.007816, over 14202.00 frames. ], tot_loss[loss=0.0673, simple_loss=0.09118, pruned_loss=0.01264, audio_tagging_loss=0.009065, over 3044609.72 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:27:02,592 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.380e+01 8.793e+01 9.266e+01 1.006e+02 1.228e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-28 00:27:49,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3270560.0, ans=0.125 2023-11-28 00:27:50,775 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3270560.0, ans=0.0 2023-11-28 00:27:59,836 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 490600 2023-11-28 00:28:00,189 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3270626.6666666665, ans=0.2 2023-11-28 00:28:06,276 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.31 vs. limit=15.0 2023-11-28 00:28:08,097 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 9650, loss[loss=0.0599, simple_loss=0.08536, pruned_loss=0.009986, audio_tagging_loss=0.007229, over 15294.00 frames. ], tot_loss[loss=0.06709, simple_loss=0.09088, pruned_loss=0.01258, audio_tagging_loss=0.009068, over 3045138.44 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:28:45,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3270826.6666666665, ans=0.0 2023-11-28 00:28:45,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3270826.6666666665, ans=0.1 2023-11-28 00:29:03,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3270960.0, ans=0.2 2023-11-28 00:29:05,943 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 490650 2023-11-28 00:29:14,549 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 9700, loss[loss=0.07525, simple_loss=0.1013, pruned_loss=0.01516, audio_tagging_loss=0.009429, over 14291.00 frames. ], tot_loss[loss=0.06708, simple_loss=0.0913, pruned_loss=0.01258, audio_tagging_loss=0.008851, over 3041655.71 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:29:18,292 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.303e+01 8.733e+01 9.513e+01 1.030e+02 1.343e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-28 00:29:35,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3271093.3333333335, ans=0.1 2023-11-28 00:30:10,960 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 490700 2023-11-28 00:30:13,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3271293.3333333335, ans=0.0 2023-11-28 00:30:18,802 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 9750, loss[loss=0.0979, simple_loss=0.1289, pruned_loss=0.02772, audio_tagging_loss=0.005737, over 14992.00 frames. ], tot_loss[loss=0.06712, simple_loss=0.09143, pruned_loss=0.01271, audio_tagging_loss=0.00869, over 3034375.50 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:30:21,368 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3271360.0, ans=0.1 2023-11-28 00:30:25,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3271360.0, ans=0.125 2023-11-28 00:30:28,674 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3271360.0, ans=0.1 2023-11-28 00:30:30,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3271426.6666666665, ans=0.0 2023-11-28 00:30:34,997 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.75 vs. limit=12.0 2023-11-28 00:30:35,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3271426.6666666665, ans=0.0 2023-11-28 00:30:55,781 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.26 vs. limit=15.0 2023-11-28 00:31:03,791 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.48 vs. limit=15.0 2023-11-28 00:31:13,656 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 490750 2023-11-28 00:31:20,506 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 9800, loss[loss=0.0751, simple_loss=0.09861, pruned_loss=0.01657, audio_tagging_loss=0.009228, over 15118.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.09006, pruned_loss=0.01257, audio_tagging_loss=0.008658, over 3029571.09 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:31:23,913 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.369e+01 8.662e+01 9.364e+01 1.024e+02 1.595e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-28 00:31:44,058 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 00:31:51,654 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.95 vs. limit=15.0 2023-11-28 00:32:13,085 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 490800 2023-11-28 00:32:13,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3271960.0, ans=0.125 2023-11-28 00:32:14,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3271960.0, ans=0.05 2023-11-28 00:32:15,811 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 00:32:19,670 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3272026.6666666665, ans=0.0 2023-11-28 00:32:20,851 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 9850, loss[loss=0.07093, simple_loss=0.1024, pruned_loss=0.01302, audio_tagging_loss=0.006697, over 16659.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.0898, pruned_loss=0.01251, audio_tagging_loss=0.008573, over 3033111.18 frames. ], batch size: 61, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:32:38,814 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3272093.3333333335, ans=0.125 2023-11-28 00:32:43,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3272093.3333333335, ans=0.125 2023-11-28 00:32:49,084 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.98 vs. limit=10.0 2023-11-28 00:32:49,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3272160.0, ans=0.0 2023-11-28 00:32:56,482 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 00:32:57,469 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3272226.6666666665, ans=0.1 2023-11-28 00:32:57,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3272226.6666666665, ans=0.0 2023-11-28 00:33:12,914 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 490850 2023-11-28 00:33:13,172 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3272293.3333333335, ans=0.1 2023-11-28 00:33:20,825 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 9900, loss[loss=0.07448, simple_loss=0.09746, pruned_loss=0.01891, audio_tagging_loss=0.006839, over 14365.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.09034, pruned_loss=0.01265, audio_tagging_loss=0.008528, over 3033315.07 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:33:24,131 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.601e+01 9.033e+01 9.485e+01 1.050e+02 1.243e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-28 00:33:31,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3272426.6666666665, ans=0.125 2023-11-28 00:33:37,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3272426.6666666665, ans=0.125 2023-11-28 00:33:48,552 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.86 vs. limit=6.0 2023-11-28 00:33:56,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3272560.0, ans=0.125 2023-11-28 00:34:11,821 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 490900 2023-11-28 00:34:12,009 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3272626.6666666665, ans=0.0 2023-11-28 00:34:14,438 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.25 vs. limit=15.0 2023-11-28 00:34:15,638 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.29 vs. limit=12.0 2023-11-28 00:34:18,417 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 9950, loss[loss=0.07397, simple_loss=0.08903, pruned_loss=0.018, audio_tagging_loss=0.01146, over 15582.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.09004, pruned_loss=0.01263, audio_tagging_loss=0.008638, over 3038377.76 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:34:38,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3272760.0, ans=0.1 2023-11-28 00:34:39,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3272760.0, ans=0.125 2023-11-28 00:34:43,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3272826.6666666665, ans=0.125 2023-11-28 00:34:44,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3272826.6666666665, ans=0.125 2023-11-28 00:34:47,248 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3272826.6666666665, ans=0.2 2023-11-28 00:34:58,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3272893.3333333335, ans=0.125 2023-11-28 00:35:09,086 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 490950 2023-11-28 00:35:16,009 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 10000, loss[loss=0.05828, simple_loss=0.08235, pruned_loss=0.008784, audio_tagging_loss=0.008326, over 15406.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08879, pruned_loss=0.01239, audio_tagging_loss=0.008728, over 3040131.40 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 32.0 2023-11-28 00:35:19,718 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.198e+01 8.605e+01 9.101e+01 9.831e+01 1.246e+02, threshold=1.820e+02, percent-clipped=0.0 2023-11-28 00:35:30,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3273093.3333333335, ans=0.2 2023-11-28 00:35:32,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3273093.3333333335, ans=0.0 2023-11-28 00:35:39,578 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.59 vs. limit=15.0 2023-11-28 00:35:42,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3273160.0, ans=0.125 2023-11-28 00:35:51,555 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3273226.6666666665, ans=0.125 2023-11-28 00:35:57,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3273226.6666666665, ans=0.0 2023-11-28 00:36:06,566 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 491000 2023-11-28 00:36:13,246 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 10050, loss[loss=0.0604, simple_loss=0.08125, pruned_loss=0.01027, audio_tagging_loss=0.009498, over 13659.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.08889, pruned_loss=0.01245, audio_tagging_loss=0.008761, over 3032175.73 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 00:36:16,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3273360.0, ans=0.1 2023-11-28 00:36:23,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3273360.0, ans=0.0 2023-11-28 00:36:36,652 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.24 vs. limit=22.5 2023-11-28 00:36:54,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3273560.0, ans=0.125 2023-11-28 00:37:05,423 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 491050 2023-11-28 00:37:11,864 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 10100, loss[loss=0.05084, simple_loss=0.05622, pruned_loss=0.009872, audio_tagging_loss=0.01286, over 13919.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08889, pruned_loss=0.01249, audio_tagging_loss=0.008734, over 3033942.28 frames. ], batch size: 54, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 00:37:15,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3273693.3333333335, ans=0.2 2023-11-28 00:37:17,287 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.464e+01 8.687e+01 9.300e+01 1.008e+02 1.276e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-28 00:37:34,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3273826.6666666665, ans=0.09899494936611666 2023-11-28 00:37:43,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3273826.6666666665, ans=0.0 2023-11-28 00:37:43,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3273826.6666666665, ans=0.125 2023-11-28 00:38:00,145 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3273960.0, ans=0.1 2023-11-28 00:38:01,034 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 00:38:02,200 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 491100 2023-11-28 00:38:09,108 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 10150, loss[loss=0.06303, simple_loss=0.08814, pruned_loss=0.01001, audio_tagging_loss=0.008959, over 15586.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.09023, pruned_loss=0.01263, audio_tagging_loss=0.00867, over 3043108.46 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 00:38:18,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3274026.6666666665, ans=0.0 2023-11-28 00:38:20,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3274093.3333333335, ans=0.1 2023-11-28 00:38:22,768 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3274093.3333333335, ans=0.2 2023-11-28 00:38:33,811 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-28 00:38:38,763 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.45 vs. limit=6.0 2023-11-28 00:38:39,107 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 00:38:41,776 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.63 vs. limit=15.0 2023-11-28 00:38:59,917 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 491150 2023-11-28 00:39:06,424 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 10200, loss[loss=0.04823, simple_loss=0.05938, pruned_loss=0.01063, audio_tagging_loss=0.007904, over 14766.00 frames. ], tot_loss[loss=0.06702, simple_loss=0.09079, pruned_loss=0.01282, audio_tagging_loss=0.008809, over 3038270.09 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 00:39:06,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3274360.0, ans=0.2 2023-11-28 00:39:09,900 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.76 vs. limit=22.5 2023-11-28 00:39:12,518 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.600e+01 8.857e+01 9.633e+01 1.053e+02 1.293e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-28 00:39:18,998 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3274426.6666666665, ans=0.2 2023-11-28 00:39:26,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3274426.6666666665, ans=0.1 2023-11-28 00:39:31,059 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 00:39:38,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3274493.3333333335, ans=0.1 2023-11-28 00:39:46,974 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.36 vs. limit=10.0 2023-11-28 00:39:57,863 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 491200 2023-11-28 00:40:05,364 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 10250, loss[loss=0.06837, simple_loss=0.0947, pruned_loss=0.01172, audio_tagging_loss=0.0093, over 15506.00 frames. ], tot_loss[loss=0.06672, simple_loss=0.09004, pruned_loss=0.01281, audio_tagging_loss=0.008882, over 3045301.07 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 00:40:06,134 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.39 vs. limit=12.0 2023-11-28 00:40:30,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3274826.6666666665, ans=0.125 2023-11-28 00:40:34,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3274826.6666666665, ans=0.125 2023-11-28 00:40:40,615 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.32 vs. limit=10.0 2023-11-28 00:40:49,318 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.83 vs. limit=15.0 2023-11-28 00:40:55,286 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.05 vs. limit=15.0 2023-11-28 00:40:55,942 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 491250 2023-11-28 00:41:02,321 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 10300, loss[loss=0.07314, simple_loss=0.08943, pruned_loss=0.01924, audio_tagging_loss=0.00918, over 15746.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.09, pruned_loss=0.01278, audio_tagging_loss=0.008869, over 3041829.14 frames. ], batch size: 59, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 00:41:05,199 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 00:41:08,324 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.570e+01 8.818e+01 9.627e+01 1.031e+02 1.268e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-28 00:41:15,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3275093.3333333335, ans=0.125 2023-11-28 00:41:16,672 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3275093.3333333335, ans=0.125 2023-11-28 00:41:22,981 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.33 vs. limit=10.0 2023-11-28 00:41:24,953 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3275160.0, ans=0.125 2023-11-28 00:41:53,400 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 491300 2023-11-28 00:41:59,906 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 10350, loss[loss=0.0834, simple_loss=0.1031, pruned_loss=0.02219, audio_tagging_loss=0.009639, over 14736.00 frames. ], tot_loss[loss=0.06763, simple_loss=0.09137, pruned_loss=0.01303, audio_tagging_loss=0.008916, over 3044392.29 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 00:42:13,674 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3275426.6666666665, ans=0.04949747468305833 2023-11-28 00:42:20,827 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3275426.6666666665, ans=0.1 2023-11-28 00:42:26,286 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3275493.3333333335, ans=0.125 2023-11-28 00:42:27,673 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.51 vs. limit=15.0 2023-11-28 00:42:50,296 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 491350 2023-11-28 00:42:56,803 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 10400, loss[loss=0.05866, simple_loss=0.07951, pruned_loss=0.01018, audio_tagging_loss=0.008729, over 16199.00 frames. ], tot_loss[loss=0.0671, simple_loss=0.09053, pruned_loss=0.01277, audio_tagging_loss=0.009066, over 3041987.20 frames. ], batch size: 61, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:43:02,213 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.297e+01 8.640e+01 9.257e+01 1.001e+02 1.271e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-28 00:43:12,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3275760.0, ans=0.125 2023-11-28 00:43:16,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3275760.0, ans=0.125 2023-11-28 00:43:30,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3275893.3333333335, ans=0.125 2023-11-28 00:43:31,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3275893.3333333335, ans=0.95 2023-11-28 00:43:35,083 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3275893.3333333335, ans=0.1 2023-11-28 00:43:42,593 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3275960.0, ans=0.0 2023-11-28 00:43:46,946 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 491400 2023-11-28 00:43:54,189 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 10450, loss[loss=0.07001, simple_loss=0.09428, pruned_loss=0.01562, audio_tagging_loss=0.007255, over 14766.00 frames. ], tot_loss[loss=0.06697, simple_loss=0.09056, pruned_loss=0.01268, audio_tagging_loss=0.009015, over 3040401.72 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:44:02,538 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3276026.6666666665, ans=0.125 2023-11-28 00:44:07,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3276093.3333333335, ans=0.125 2023-11-28 00:44:16,670 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3276160.0, ans=0.125 2023-11-28 00:44:33,991 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.34 vs. limit=15.0 2023-11-28 00:44:35,604 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3276226.6666666665, ans=0.0 2023-11-28 00:44:44,532 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 491450 2023-11-28 00:44:51,524 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 10500, loss[loss=0.05777, simple_loss=0.08464, pruned_loss=0.007668, audio_tagging_loss=0.007781, over 15785.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.08985, pruned_loss=0.01255, audio_tagging_loss=0.008911, over 3049048.44 frames. ], batch size: 60, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:44:57,002 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.171e+01 8.695e+01 9.363e+01 1.021e+02 1.243e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-28 00:45:17,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3276493.3333333335, ans=0.125 2023-11-28 00:45:25,809 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3276560.0, ans=0.0 2023-11-28 00:45:30,785 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.42 vs. limit=15.0 2023-11-28 00:45:42,032 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 491500 2023-11-28 00:45:48,540 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 10550, loss[loss=0.05869, simple_loss=0.08199, pruned_loss=0.008572, audio_tagging_loss=0.009126, over 15220.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.08999, pruned_loss=0.01258, audio_tagging_loss=0.008843, over 3045779.04 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:45:51,500 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.62 vs. limit=15.0 2023-11-28 00:45:52,215 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.47 vs. limit=15.0 2023-11-28 00:45:56,705 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.10 vs. limit=22.5 2023-11-28 00:46:00,948 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3276760.0, ans=0.0 2023-11-28 00:46:01,069 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.33 vs. limit=15.0 2023-11-28 00:46:14,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3276826.6666666665, ans=0.0 2023-11-28 00:46:39,187 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 491550 2023-11-28 00:46:45,611 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 10600, loss[loss=0.07202, simple_loss=0.1049, pruned_loss=0.01386, audio_tagging_loss=0.005711, over 15075.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.08985, pruned_loss=0.01245, audio_tagging_loss=0.008738, over 3041983.15 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:46:51,904 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.296e+01 8.682e+01 9.138e+01 9.881e+01 1.216e+02, threshold=1.828e+02, percent-clipped=0.0 2023-11-28 00:47:00,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3277093.3333333335, ans=0.125 2023-11-28 00:47:23,863 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3277226.6666666665, ans=0.125 2023-11-28 00:47:28,574 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.43 vs. limit=15.0 2023-11-28 00:47:36,666 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 491600 2023-11-28 00:47:38,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3277293.3333333335, ans=0.0 2023-11-28 00:47:44,091 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 10650, loss[loss=0.06717, simple_loss=0.09972, pruned_loss=0.01068, audio_tagging_loss=0.006629, over 14862.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.09082, pruned_loss=0.01248, audio_tagging_loss=0.008709, over 3042158.31 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:47:44,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3277360.0, ans=0.125 2023-11-28 00:47:49,009 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.82 vs. limit=15.0 2023-11-28 00:47:54,543 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3277426.6666666665, ans=0.0 2023-11-28 00:47:56,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3277426.6666666665, ans=0.025 2023-11-28 00:48:06,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3277493.3333333335, ans=0.125 2023-11-28 00:48:12,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3277493.3333333335, ans=0.125 2023-11-28 00:48:24,110 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3277560.0, ans=0.0 2023-11-28 00:48:34,204 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 491650 2023-11-28 00:48:35,546 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3277626.6666666665, ans=0.125 2023-11-28 00:48:38,269 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 00:48:40,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3277693.3333333335, ans=0.125 2023-11-28 00:48:41,247 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 10700, loss[loss=0.06542, simple_loss=0.08648, pruned_loss=0.01323, audio_tagging_loss=0.008953, over 16051.00 frames. ], tot_loss[loss=0.06684, simple_loss=0.09103, pruned_loss=0.01261, audio_tagging_loss=0.008711, over 3048690.44 frames. ], batch size: 60, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:48:46,553 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.912e+01 8.856e+01 9.300e+01 9.841e+01 1.574e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-28 00:48:47,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3277693.3333333335, ans=0.125 2023-11-28 00:48:59,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3277760.0, ans=0.2 2023-11-28 00:49:01,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3277826.6666666665, ans=0.0 2023-11-28 00:49:13,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3277893.3333333335, ans=0.125 2023-11-28 00:49:30,813 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 491700 2023-11-28 00:49:37,223 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 10750, loss[loss=0.04963, simple_loss=0.06404, pruned_loss=0.008505, audio_tagging_loss=0.0091, over 15229.00 frames. ], tot_loss[loss=0.06679, simple_loss=0.09101, pruned_loss=0.01262, audio_tagging_loss=0.008672, over 3044668.19 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:49:58,913 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3278093.3333333335, ans=0.125 2023-11-28 00:50:05,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3278160.0, ans=0.1 2023-11-28 00:50:17,659 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.59 vs. limit=15.0 2023-11-28 00:50:22,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3278293.3333333335, ans=0.125 2023-11-28 00:50:22,703 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3278293.3333333335, ans=0.1 2023-11-28 00:50:27,877 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 491750 2023-11-28 00:50:28,399 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.43 vs. limit=22.5 2023-11-28 00:50:33,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3278293.3333333335, ans=0.0 2023-11-28 00:50:35,826 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 10800, loss[loss=0.06806, simple_loss=0.08533, pruned_loss=0.01151, audio_tagging_loss=0.01388, over 15083.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.0899, pruned_loss=0.01249, audio_tagging_loss=0.008763, over 3043527.44 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 32.0 2023-11-28 00:50:38,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3278360.0, ans=0.1 2023-11-28 00:50:41,283 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.816e+01 8.647e+01 9.300e+01 1.005e+02 1.391e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-28 00:51:12,024 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.89 vs. limit=15.0 2023-11-28 00:51:15,302 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 00:51:18,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3278560.0, ans=0.2 2023-11-28 00:51:26,181 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 491800 2023-11-28 00:51:33,556 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 10850, loss[loss=0.05789, simple_loss=0.07052, pruned_loss=0.01281, audio_tagging_loss=0.009823, over 15483.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.09001, pruned_loss=0.0125, audio_tagging_loss=0.008692, over 3042726.80 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 32.0 2023-11-28 00:51:40,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3278693.3333333335, ans=0.07 2023-11-28 00:51:41,910 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.94 vs. limit=15.0 2023-11-28 00:51:50,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3278760.0, ans=0.1 2023-11-28 00:51:53,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3278760.0, ans=0.125 2023-11-28 00:52:01,758 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3278826.6666666665, ans=0.125 2023-11-28 00:52:08,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3278893.3333333335, ans=0.125 2023-11-28 00:52:09,000 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.78 vs. limit=10.0 2023-11-28 00:52:13,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3278893.3333333335, ans=0.125 2023-11-28 00:52:17,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3278960.0, ans=0.0 2023-11-28 00:52:22,966 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 491850 2023-11-28 00:52:28,269 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 00:52:29,392 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 10900, loss[loss=0.06292, simple_loss=0.07335, pruned_loss=0.01194, audio_tagging_loss=0.0143, over 14824.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.08968, pruned_loss=0.01254, audio_tagging_loss=0.008706, over 3039064.97 frames. ], batch size: 60, lr: 1.64e-03, grad_scale: 32.0 2023-11-28 00:52:29,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3279026.6666666665, ans=0.2 2023-11-28 00:52:33,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3279026.6666666665, ans=0.5 2023-11-28 00:52:34,715 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.617e+01 8.904e+01 9.696e+01 1.053e+02 1.235e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-28 00:52:43,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3279093.3333333335, ans=0.2 2023-11-28 00:52:53,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3279160.0, ans=0.1 2023-11-28 00:52:59,045 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.whiten.whitening_limit, batch_count=3279160.0, ans=12.0 2023-11-28 00:53:13,039 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3279226.6666666665, ans=0.125 2023-11-28 00:53:19,347 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 491900 2023-11-28 00:53:20,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3279293.3333333335, ans=0.125 2023-11-28 00:53:21,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3279293.3333333335, ans=0.125 2023-11-28 00:53:26,256 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 10950, loss[loss=0.06813, simple_loss=0.08891, pruned_loss=0.01314, audio_tagging_loss=0.01053, over 16462.00 frames. ], tot_loss[loss=0.06624, simple_loss=0.09014, pruned_loss=0.01248, audio_tagging_loss=0.00869, over 3037538.56 frames. ], batch size: 61, lr: 1.64e-03, grad_scale: 32.0 2023-11-28 00:53:31,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3279360.0, ans=0.125 2023-11-28 00:53:34,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3279360.0, ans=0.125 2023-11-28 00:53:43,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3279426.6666666665, ans=0.2 2023-11-28 00:53:46,598 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3279426.6666666665, ans=0.125 2023-11-28 00:53:58,901 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3279493.3333333335, ans=0.07 2023-11-28 00:54:11,917 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3279626.6666666665, ans=0.0 2023-11-28 00:54:13,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3279626.6666666665, ans=0.0 2023-11-28 00:54:17,741 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 491950 2023-11-28 00:54:24,173 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 11000, loss[loss=0.06592, simple_loss=0.0809, pruned_loss=0.01629, audio_tagging_loss=0.00918, over 14785.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.08964, pruned_loss=0.01244, audio_tagging_loss=0.008878, over 3041001.99 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 32.0 2023-11-28 00:54:30,057 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.813e+01 8.785e+01 9.323e+01 1.002e+02 1.243e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-28 00:54:35,504 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 00:54:47,672 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3279826.6666666665, ans=10.0 2023-11-28 00:54:59,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3279893.3333333335, ans=0.0 2023-11-28 00:55:14,289 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 492000 2023-11-28 00:55:22,891 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 11050, loss[loss=0.05329, simple_loss=0.06675, pruned_loss=0.01003, audio_tagging_loss=0.00988, over 14642.00 frames. ], tot_loss[loss=0.06722, simple_loss=0.09094, pruned_loss=0.01296, audio_tagging_loss=0.008785, over 3048470.93 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 32.0 2023-11-28 00:55:36,643 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.01 vs. limit=15.0 2023-11-28 00:56:12,654 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 492050 2023-11-28 00:56:19,112 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 11100, loss[loss=0.06231, simple_loss=0.07832, pruned_loss=0.01148, audio_tagging_loss=0.01167, over 13831.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.09005, pruned_loss=0.0127, audio_tagging_loss=0.008862, over 3046949.12 frames. ], batch size: 53, lr: 1.64e-03, grad_scale: 32.0 2023-11-28 00:56:25,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3280360.0, ans=0.125 2023-11-28 00:56:26,427 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.422e+01 8.769e+01 9.313e+01 9.922e+01 1.261e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-28 00:56:37,596 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3280426.6666666665, ans=0.125 2023-11-28 00:56:43,921 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3280493.3333333335, ans=0.0 2023-11-28 00:56:50,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3280493.3333333335, ans=0.0 2023-11-28 00:56:50,957 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.94 vs. limit=22.5 2023-11-28 00:56:51,688 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3280493.3333333335, ans=0.125 2023-11-28 00:57:09,874 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 492100 2023-11-28 00:57:10,264 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.02 vs. limit=15.0 2023-11-28 00:57:16,939 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 11150, loss[loss=0.06783, simple_loss=0.08854, pruned_loss=0.01437, audio_tagging_loss=0.009194, over 15402.00 frames. ], tot_loss[loss=0.06662, simple_loss=0.09003, pruned_loss=0.01275, audio_tagging_loss=0.008854, over 3054822.58 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:57:22,953 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3280693.3333333335, ans=0.125 2023-11-28 00:57:25,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3280693.3333333335, ans=0.0 2023-11-28 00:57:36,051 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3280760.0, ans=0.2 2023-11-28 00:57:40,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3280826.6666666665, ans=0.2 2023-11-28 00:57:43,767 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-28 00:57:44,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff2.min_abs, batch_count=3280826.6666666665, ans=0.1 2023-11-28 00:57:46,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3280826.6666666665, ans=0.125 2023-11-28 00:57:58,731 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3280893.3333333335, ans=0.125 2023-11-28 00:57:58,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3280893.3333333335, ans=0.125 2023-11-28 00:58:03,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3280960.0, ans=0.0 2023-11-28 00:58:07,344 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 492150 2023-11-28 00:58:13,841 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 11200, loss[loss=0.07358, simple_loss=0.09834, pruned_loss=0.01583, audio_tagging_loss=0.00858, over 16515.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.08958, pruned_loss=0.01282, audio_tagging_loss=0.00899, over 3048998.11 frames. ], batch size: 60, lr: 1.64e-03, grad_scale: 32.0 2023-11-28 00:58:20,458 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.565e+01 8.951e+01 9.684e+01 1.065e+02 1.269e+02, threshold=1.937e+02, percent-clipped=0.0 2023-11-28 00:58:22,047 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.42 vs. limit=12.0 2023-11-28 00:58:27,811 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.60 vs. limit=12.0 2023-11-28 00:58:34,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3281093.3333333335, ans=0.0 2023-11-28 00:58:34,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3281093.3333333335, ans=0.125 2023-11-28 00:58:36,512 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.39 vs. limit=12.0 2023-11-28 00:58:37,041 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3281160.0, ans=0.2 2023-11-28 00:58:54,093 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.13 vs. limit=15.0 2023-11-28 00:59:00,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3281293.3333333335, ans=0.1 2023-11-28 00:59:04,333 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 492200 2023-11-28 00:59:11,114 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 11250, loss[loss=0.06067, simple_loss=0.0809, pruned_loss=0.0109, audio_tagging_loss=0.009319, over 15896.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.08917, pruned_loss=0.01264, audio_tagging_loss=0.009067, over 3054290.68 frames. ], batch size: 60, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:59:53,583 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3281560.0, ans=0.125 2023-11-28 01:00:01,802 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 492250 2023-11-28 01:00:08,267 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 11300, loss[loss=0.06163, simple_loss=0.0834, pruned_loss=0.01016, audio_tagging_loss=0.009767, over 14425.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.08921, pruned_loss=0.01265, audio_tagging_loss=0.008976, over 3048170.77 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 01:00:17,051 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.226e+01 8.951e+01 9.399e+01 1.008e+02 1.489e+02, threshold=1.880e+02, percent-clipped=0.0 2023-11-28 01:00:19,671 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3281760.0, ans=0.0 2023-11-28 01:00:30,437 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3281826.6666666665, ans=0.125 2023-11-28 01:00:30,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3281826.6666666665, ans=0.125 2023-11-28 01:00:59,897 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 492300 2023-11-28 01:01:03,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3281960.0, ans=0.0 2023-11-28 01:01:05,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3282026.6666666665, ans=0.125 2023-11-28 01:01:06,437 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 11350, loss[loss=0.05165, simple_loss=0.0661, pruned_loss=0.009759, audio_tagging_loss=0.008846, over 13539.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.0888, pruned_loss=0.01266, audio_tagging_loss=0.008884, over 3046661.91 frames. ], batch size: 54, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 01:01:13,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3282026.6666666665, ans=0.0 2023-11-28 01:01:30,952 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.62 vs. limit=22.5 2023-11-28 01:01:56,576 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 492350 2023-11-28 01:02:02,958 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 11400, loss[loss=0.06548, simple_loss=0.0916, pruned_loss=0.01225, audio_tagging_loss=0.007431, over 16084.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.08922, pruned_loss=0.01276, audio_tagging_loss=0.008841, over 3048870.29 frames. ], batch size: 59, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 01:02:11,605 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.540e+01 8.755e+01 9.421e+01 1.020e+02 1.331e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-28 01:02:20,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3282426.6666666665, ans=0.035 2023-11-28 01:02:36,346 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.93 vs. limit=15.0 2023-11-28 01:02:39,495 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.86 vs. limit=15.0 2023-11-28 01:02:43,579 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3282560.0, ans=0.0 2023-11-28 01:02:49,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3282626.6666666665, ans=0.125 2023-11-28 01:02:53,663 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 492400 2023-11-28 01:03:00,927 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 11450, loss[loss=0.07087, simple_loss=0.1021, pruned_loss=0.01227, audio_tagging_loss=0.007552, over 15796.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.08902, pruned_loss=0.01261, audio_tagging_loss=0.008806, over 3042990.82 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 01:03:03,458 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3282693.3333333335, ans=0.125 2023-11-28 01:03:08,240 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 01:03:21,366 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.81 vs. limit=6.0 2023-11-28 01:03:52,007 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 492450 2023-11-28 01:03:53,621 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.14 vs. limit=10.0 2023-11-28 01:03:55,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3282960.0, ans=0.125 2023-11-28 01:03:57,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3283026.6666666665, ans=0.125 2023-11-28 01:03:58,521 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 11500, loss[loss=0.05603, simple_loss=0.07134, pruned_loss=0.01032, audio_tagging_loss=0.01004, over 16040.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.08947, pruned_loss=0.01256, audio_tagging_loss=0.008695, over 3051046.93 frames. ], batch size: 63, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 01:04:04,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3283026.6666666665, ans=0.1 2023-11-28 01:04:06,710 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.648e+01 8.678e+01 9.475e+01 1.027e+02 1.615e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-28 01:04:29,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3283160.0, ans=0.0 2023-11-28 01:04:29,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3283160.0, ans=0.125 2023-11-28 01:04:49,262 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 492500 2023-11-28 01:04:55,708 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 11550, loss[loss=0.07018, simple_loss=0.09726, pruned_loss=0.01373, audio_tagging_loss=0.007825, over 15247.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.09022, pruned_loss=0.01259, audio_tagging_loss=0.008609, over 3054414.52 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 01:05:01,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3283360.0, ans=0.125 2023-11-28 01:05:11,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3283426.6666666665, ans=0.1 2023-11-28 01:05:11,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3283426.6666666665, ans=0.5 2023-11-28 01:05:32,988 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 01:05:35,744 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.58 vs. limit=22.5 2023-11-28 01:05:39,194 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.21 vs. limit=15.0 2023-11-28 01:05:41,895 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3283626.6666666665, ans=0.125 2023-11-28 01:05:46,498 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 492550 2023-11-28 01:05:47,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3283626.6666666665, ans=10.0 2023-11-28 01:05:53,389 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 11600, loss[loss=0.05477, simple_loss=0.06616, pruned_loss=0.009767, audio_tagging_loss=0.01193, over 15947.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.09009, pruned_loss=0.01255, audio_tagging_loss=0.0086, over 3062154.39 frames. ], batch size: 60, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 01:06:03,693 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.581e+01 9.077e+01 9.650e+01 1.023e+02 1.320e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-28 01:06:04,410 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.37 vs. limit=12.0 2023-11-28 01:06:09,463 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 01:06:24,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3283826.6666666665, ans=0.125 2023-11-28 01:06:29,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3283893.3333333335, ans=0.0 2023-11-28 01:06:34,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3283893.3333333335, ans=0.1 2023-11-28 01:06:41,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3283960.0, ans=0.0 2023-11-28 01:06:43,878 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 492600 2023-11-28 01:06:51,010 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 11650, loss[loss=0.05711, simple_loss=0.07633, pruned_loss=0.009312, audio_tagging_loss=0.009635, over 16334.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.08979, pruned_loss=0.01241, audio_tagging_loss=0.0087, over 3060390.73 frames. ], batch size: 62, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 01:06:58,894 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3284026.6666666665, ans=0.1 2023-11-28 01:07:07,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3284093.3333333335, ans=0.0 2023-11-28 01:07:12,919 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3284160.0, ans=0.0 2023-11-28 01:07:25,854 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.20 vs. limit=12.0 2023-11-28 01:07:38,122 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3284293.3333333335, ans=0.2 2023-11-28 01:07:40,492 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3284293.3333333335, ans=0.2 2023-11-28 01:07:41,365 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 492650 2023-11-28 01:07:48,464 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 11700, loss[loss=0.05763, simple_loss=0.07418, pruned_loss=0.01017, audio_tagging_loss=0.01037, over 14885.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.08899, pruned_loss=0.01235, audio_tagging_loss=0.008727, over 3049731.44 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 01:07:56,931 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3284360.0, ans=0.125 2023-11-28 01:07:58,790 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.595e+01 8.938e+01 9.502e+01 1.017e+02 1.872e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-28 01:07:59,097 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3284426.6666666665, ans=0.09899494936611666 2023-11-28 01:08:09,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3284426.6666666665, ans=0.125 2023-11-28 01:08:16,594 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3284493.3333333335, ans=0.125 2023-11-28 01:08:27,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3284560.0, ans=0.125 2023-11-28 01:08:27,437 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 01:08:39,182 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 492700 2023-11-28 01:08:46,023 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 11750, loss[loss=0.06911, simple_loss=0.09597, pruned_loss=0.01363, audio_tagging_loss=0.007492, over 15251.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.08995, pruned_loss=0.01247, audio_tagging_loss=0.008745, over 3047690.55 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 01:08:49,765 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.12 vs. limit=15.0 2023-11-28 01:08:51,705 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3284693.3333333335, ans=0.125 2023-11-28 01:09:04,651 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.18 vs. limit=22.5 2023-11-28 01:09:10,121 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.64 vs. limit=15.0 2023-11-28 01:09:10,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3284826.6666666665, ans=0.125 2023-11-28 01:09:36,248 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 492750 2023-11-28 01:09:37,994 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.86 vs. limit=6.0 2023-11-28 01:09:43,365 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 11800, loss[loss=0.0723, simple_loss=0.106, pruned_loss=0.01153, audio_tagging_loss=0.007746, over 14810.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.08981, pruned_loss=0.01243, audio_tagging_loss=0.008788, over 3043148.05 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 01:09:53,164 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.217e+01 8.795e+01 9.542e+01 1.022e+02 1.386e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-28 01:09:59,350 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.00 vs. limit=15.0 2023-11-28 01:10:04,106 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 01:10:06,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3285160.0, ans=0.0 2023-11-28 01:10:30,504 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3285293.3333333335, ans=0.0 2023-11-28 01:10:31,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3285293.3333333335, ans=0.125 2023-11-28 01:10:33,733 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 492800 2023-11-28 01:10:40,540 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 11850, loss[loss=0.05902, simple_loss=0.07503, pruned_loss=0.01191, audio_tagging_loss=0.009597, over 15220.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.08928, pruned_loss=0.01232, audio_tagging_loss=0.008823, over 3047021.61 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 01:10:58,978 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.68 vs. limit=12.0 2023-11-28 01:11:12,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3285493.3333333335, ans=0.2 2023-11-28 01:11:27,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3285626.6666666665, ans=0.0 2023-11-28 01:11:31,364 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 492850 2023-11-28 01:11:38,309 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 11900, loss[loss=0.08014, simple_loss=0.1112, pruned_loss=0.01618, audio_tagging_loss=0.008374, over 15395.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.08959, pruned_loss=0.01235, audio_tagging_loss=0.008914, over 3045068.95 frames. ], batch size: 53, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 01:11:45,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3285693.3333333335, ans=0.0 2023-11-28 01:11:47,132 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.46 vs. limit=15.0 2023-11-28 01:11:48,648 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.534e+01 8.587e+01 9.286e+01 9.981e+01 1.301e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-28 01:12:06,913 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3285826.6666666665, ans=0.125 2023-11-28 01:12:12,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3285893.3333333335, ans=0.125 2023-11-28 01:12:17,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3285893.3333333335, ans=0.125 2023-11-28 01:12:29,097 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 492900 2023-11-28 01:12:36,109 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 11950, loss[loss=0.07647, simple_loss=0.1099, pruned_loss=0.01536, audio_tagging_loss=0.006175, over 16117.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.08972, pruned_loss=0.01229, audio_tagging_loss=0.009021, over 3054729.52 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 01:12:55,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3286093.3333333335, ans=0.125 2023-11-28 01:13:02,523 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3286160.0, ans=0.07 2023-11-28 01:13:24,830 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 492950 2023-11-28 01:13:30,985 INFO [train_asr.py:1235] (1/4) Epoch 41, batch 12000, loss[loss=0.06496, simple_loss=0.08344, pruned_loss=0.01263, audio_tagging_loss=0.01061, over 15500.00 frames. ], tot_loss[loss=0.06653, simple_loss=0.09005, pruned_loss=0.0124, audio_tagging_loss=0.009098, over 3054818.94 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 01:13:30,986 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-28 01:14:05,670 INFO [train_asr.py:1267] (1/4) Epoch 41, validation: loss=0.05796, simple_loss=0.05063, pruned_loss=0.005209, audio_tagging_loss=0.02743, over 4681554.00 frames. 2023-11-28 01:14:05,671 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-28 01:14:09,007 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3286360.0, ans=0.07 2023-11-28 01:14:15,054 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.468e+01 8.802e+01 9.296e+01 1.010e+02 1.466e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-28 01:14:15,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3286426.6666666665, ans=0.125 2023-11-28 01:14:20,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3286426.6666666665, ans=0.125 2023-11-28 01:14:25,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3286426.6666666665, ans=0.125 2023-11-28 01:14:28,466 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.78 vs. limit=15.0 2023-11-28 01:14:48,426 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 0, loss[loss=0.06775, simple_loss=0.06961, pruned_loss=0.009146, audio_tagging_loss=0.0238, over 15604.00 frames. ], tot_loss[loss=0.06775, simple_loss=0.06961, pruned_loss=0.009146, audio_tagging_loss=0.0238, over 15604.00 frames. ], batch size: 64, lr: 1.62e-03, grad_scale: 32.0 2023-11-28 01:14:48,427 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-28 01:15:03,382 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.4095, 3.3429, 3.8138, 3.5870], device='cuda:1') 2023-11-28 01:15:22,239 INFO [train_asr.py:1267] (1/4) Epoch 42, validation: loss=0.05771, simple_loss=0.05063, pruned_loss=0.005208, audio_tagging_loss=0.02719, over 4681554.00 frames. 2023-11-28 01:15:22,239 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-28 01:15:24,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3286513.3333333335, ans=10.0 2023-11-28 01:15:24,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=3286513.3333333335, ans=0.95 2023-11-28 01:15:31,692 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3286513.3333333335, ans=0.5 2023-11-28 01:15:45,718 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 493000 2023-11-28 01:16:04,195 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.76 vs. limit=6.0 2023-11-28 01:16:14,619 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.45 vs. limit=15.0 2023-11-28 01:16:17,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3286780.0, ans=0.125 2023-11-28 01:16:19,449 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 50, loss[loss=0.07835, simple_loss=0.1018, pruned_loss=0.01374, audio_tagging_loss=0.01372, over 15639.00 frames. ], tot_loss[loss=0.07439, simple_loss=0.09075, pruned_loss=0.01258, audio_tagging_loss=0.01643, over 694660.46 frames. ], batch size: 59, lr: 1.62e-03, grad_scale: 32.0 2023-11-28 01:16:24,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3286846.6666666665, ans=0.09899494936611666 2023-11-28 01:16:42,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3286980.0, ans=0.0 2023-11-28 01:16:43,997 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 493050 2023-11-28 01:16:47,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3286980.0, ans=0.125 2023-11-28 01:16:50,175 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.63 vs. limit=10.0 2023-11-28 01:16:59,657 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.18 vs. limit=22.5 2023-11-28 01:17:01,282 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.729e+01 9.634e+01 1.031e+02 1.127e+02 1.457e+02, threshold=2.062e+02, percent-clipped=0.0 2023-11-28 01:17:16,856 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 100, loss[loss=0.07922, simple_loss=0.1005, pruned_loss=0.01498, audio_tagging_loss=0.01402, over 15578.00 frames. ], tot_loss[loss=0.0747, simple_loss=0.09201, pruned_loss=0.01277, audio_tagging_loss=0.01592, over 1212073.94 frames. ], batch size: 57, lr: 1.62e-03, grad_scale: 32.0 2023-11-28 01:17:35,021 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3287246.6666666665, ans=0.0 2023-11-28 01:17:42,085 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 493100 2023-11-28 01:17:59,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3287380.0, ans=0.125 2023-11-28 01:18:15,157 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 150, loss[loss=0.07362, simple_loss=0.09499, pruned_loss=0.01227, audio_tagging_loss=0.01385, over 15829.00 frames. ], tot_loss[loss=0.07321, simple_loss=0.09198, pruned_loss=0.01286, audio_tagging_loss=0.01436, over 1618034.02 frames. ], batch size: 58, lr: 1.62e-03, grad_scale: 32.0 2023-11-28 01:18:15,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3287513.3333333335, ans=0.1 2023-11-28 01:18:17,473 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3287513.3333333335, ans=0.0 2023-11-28 01:18:26,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3287580.0, ans=0.125 2023-11-28 01:18:31,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3287580.0, ans=0.125 2023-11-28 01:18:35,921 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3287580.0, ans=0.125 2023-11-28 01:18:38,982 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 493150 2023-11-28 01:18:44,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3287646.6666666665, ans=0.125 2023-11-28 01:18:57,022 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.207e+01 8.959e+01 9.616e+01 1.058e+02 1.322e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-28 01:19:12,892 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 200, loss[loss=0.06565, simple_loss=0.09041, pruned_loss=0.01155, audio_tagging_loss=0.008896, over 15143.00 frames. ], tot_loss[loss=0.07004, simple_loss=0.08915, pruned_loss=0.01252, audio_tagging_loss=0.01294, over 1936026.29 frames. ], batch size: 56, lr: 1.62e-03, grad_scale: 32.0 2023-11-28 01:19:13,355 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.01 vs. limit=15.0 2023-11-28 01:19:25,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3287913.3333333335, ans=0.1 2023-11-28 01:19:26,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3287913.3333333335, ans=0.0 2023-11-28 01:19:37,397 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 493200 2023-11-28 01:19:41,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3287980.0, ans=0.125 2023-11-28 01:19:54,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3288046.6666666665, ans=0.125 2023-11-28 01:20:09,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3288113.3333333335, ans=0.125 2023-11-28 01:20:11,199 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 250, loss[loss=0.07081, simple_loss=0.09616, pruned_loss=0.01411, audio_tagging_loss=0.008615, over 15631.00 frames. ], tot_loss[loss=0.06929, simple_loss=0.0898, pruned_loss=0.01251, audio_tagging_loss=0.01188, over 2180273.29 frames. ], batch size: 56, lr: 1.62e-03, grad_scale: 32.0 2023-11-28 01:20:11,665 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.51 vs. limit=12.0 2023-11-28 01:20:12,672 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3288180.0, ans=0.0 2023-11-28 01:20:24,307 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 01:20:31,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3288246.6666666665, ans=0.125 2023-11-28 01:20:32,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3288246.6666666665, ans=0.0 2023-11-28 01:20:36,517 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 493250 2023-11-28 01:20:37,110 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.48 vs. limit=10.0 2023-11-28 01:20:46,976 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.43 vs. limit=15.0 2023-11-28 01:20:49,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3288380.0, ans=0.0 2023-11-28 01:20:49,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3288380.0, ans=0.125 2023-11-28 01:20:52,868 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.282e+01 9.203e+01 9.691e+01 1.057e+02 1.267e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-28 01:20:53,066 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3288380.0, ans=0.125 2023-11-28 01:21:09,022 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 300, loss[loss=0.05569, simple_loss=0.0729, pruned_loss=0.01012, audio_tagging_loss=0.009115, over 14508.00 frames. ], tot_loss[loss=0.06884, simple_loss=0.09037, pruned_loss=0.01273, audio_tagging_loss=0.01092, over 2373457.15 frames. ], batch size: 56, lr: 1.62e-03, grad_scale: 32.0 2023-11-28 01:21:09,354 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3288513.3333333335, ans=0.0 2023-11-28 01:21:23,887 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.79 vs. limit=15.0 2023-11-28 01:21:33,301 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 493300 2023-11-28 01:21:44,319 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3288713.3333333335, ans=0.0 2023-11-28 01:22:04,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3288780.0, ans=0.2 2023-11-28 01:22:06,957 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 350, loss[loss=0.06449, simple_loss=0.08201, pruned_loss=0.01238, audio_tagging_loss=0.0111, over 15244.00 frames. ], tot_loss[loss=0.06802, simple_loss=0.08974, pruned_loss=0.0127, audio_tagging_loss=0.01045, over 2523177.51 frames. ], batch size: 57, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:22:30,715 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 493350 2023-11-28 01:22:49,682 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.180e+01 8.671e+01 9.361e+01 1.014e+02 1.227e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-28 01:22:53,250 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3289113.3333333335, ans=0.1 2023-11-28 01:23:03,885 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 400, loss[loss=0.07555, simple_loss=0.1038, pruned_loss=0.01673, audio_tagging_loss=0.006921, over 15942.00 frames. ], tot_loss[loss=0.06759, simple_loss=0.08969, pruned_loss=0.01264, audio_tagging_loss=0.0101, over 2640212.10 frames. ], batch size: 58, lr: 1.62e-03, grad_scale: 32.0 2023-11-28 01:23:27,950 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 493400 2023-11-28 01:23:35,230 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.15 vs. limit=22.5 2023-11-28 01:23:37,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3289313.3333333335, ans=0.125 2023-11-28 01:23:43,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3289380.0, ans=0.125 2023-11-28 01:23:48,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3289380.0, ans=0.125 2023-11-28 01:23:50,309 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3289446.6666666665, ans=0.0 2023-11-28 01:23:51,513 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3289446.6666666665, ans=0.125 2023-11-28 01:23:54,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3289446.6666666665, ans=0.0 2023-11-28 01:23:56,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3289446.6666666665, ans=0.125 2023-11-28 01:23:57,221 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.39 vs. limit=6.0 2023-11-28 01:24:02,007 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 450, loss[loss=0.08111, simple_loss=0.1042, pruned_loss=0.02155, audio_tagging_loss=0.007449, over 15528.00 frames. ], tot_loss[loss=0.06738, simple_loss=0.08967, pruned_loss=0.01273, audio_tagging_loss=0.009819, over 2732572.78 frames. ], batch size: 57, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:24:06,538 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3289513.3333333335, ans=0.05 2023-11-28 01:24:10,244 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.79 vs. limit=15.0 2023-11-28 01:24:17,764 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 01:24:18,781 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3289580.0, ans=0.2 2023-11-28 01:24:26,418 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 493450 2023-11-28 01:24:36,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3289713.3333333335, ans=0.1 2023-11-28 01:24:45,838 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.317e+01 8.828e+01 9.254e+01 1.009e+02 1.850e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-28 01:24:50,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3289780.0, ans=0.1 2023-11-28 01:24:53,013 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.82 vs. limit=15.0 2023-11-28 01:24:59,855 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 500, loss[loss=0.08336, simple_loss=0.1296, pruned_loss=0.01159, audio_tagging_loss=0.00698, over 15199.00 frames. ], tot_loss[loss=0.06725, simple_loss=0.09031, pruned_loss=0.01259, audio_tagging_loss=0.00951, over 2794340.24 frames. ], batch size: 56, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:25:12,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3289913.3333333335, ans=0.125 2023-11-28 01:25:13,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3289913.3333333335, ans=0.125 2023-11-28 01:25:15,080 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3289913.3333333335, ans=0.1 2023-11-28 01:25:17,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3289913.3333333335, ans=0.0 2023-11-28 01:25:23,588 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 493500 2023-11-28 01:25:31,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3289980.0, ans=0.0 2023-11-28 01:25:33,433 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3290046.6666666665, ans=0.0 2023-11-28 01:25:57,461 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 550, loss[loss=0.05977, simple_loss=0.07854, pruned_loss=0.01102, audio_tagging_loss=0.009475, over 15090.00 frames. ], tot_loss[loss=0.0664, simple_loss=0.08925, pruned_loss=0.01242, audio_tagging_loss=0.009354, over 2844648.31 frames. ], batch size: 56, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:26:11,726 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3290246.6666666665, ans=0.125 2023-11-28 01:26:21,507 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 493550 2023-11-28 01:26:31,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3290380.0, ans=0.5 2023-11-28 01:26:41,289 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.539e+01 8.718e+01 9.476e+01 1.036e+02 1.288e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-28 01:26:41,835 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.58 vs. limit=15.0 2023-11-28 01:26:55,496 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 600, loss[loss=0.04628, simple_loss=0.06026, pruned_loss=0.007363, audio_tagging_loss=0.00879, over 14110.00 frames. ], tot_loss[loss=0.06683, simple_loss=0.09002, pruned_loss=0.0126, audio_tagging_loss=0.009221, over 2888747.36 frames. ], batch size: 53, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:27:02,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3290513.3333333335, ans=0.015 2023-11-28 01:27:02,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3290513.3333333335, ans=0.0 2023-11-28 01:27:03,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3290513.3333333335, ans=0.0 2023-11-28 01:27:12,241 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.85 vs. limit=10.0 2023-11-28 01:27:12,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3290580.0, ans=0.125 2023-11-28 01:27:20,236 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 493600 2023-11-28 01:27:20,592 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=3290646.6666666665, ans=15.0 2023-11-28 01:27:21,520 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3290646.6666666665, ans=0.025 2023-11-28 01:27:32,246 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.94 vs. limit=15.0 2023-11-28 01:27:35,194 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3290713.3333333335, ans=0.2 2023-11-28 01:27:53,959 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 650, loss[loss=0.06418, simple_loss=0.09641, pruned_loss=0.009453, audio_tagging_loss=0.006525, over 14985.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.08961, pruned_loss=0.01233, audio_tagging_loss=0.009224, over 2923146.63 frames. ], batch size: 57, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:27:55,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3290846.6666666665, ans=0.1 2023-11-28 01:28:13,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=3290913.3333333335, ans=0.5 2023-11-28 01:28:17,827 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 493650 2023-11-28 01:28:23,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3290980.0, ans=0.0 2023-11-28 01:28:24,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3290980.0, ans=0.125 2023-11-28 01:28:38,895 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.095e+01 8.722e+01 9.347e+01 9.970e+01 1.370e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-28 01:28:49,933 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.37 vs. limit=22.5 2023-11-28 01:28:51,645 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 700, loss[loss=0.07674, simple_loss=0.1137, pruned_loss=0.0142, audio_tagging_loss=0.0057, over 14285.00 frames. ], tot_loss[loss=0.06645, simple_loss=0.09, pruned_loss=0.01238, audio_tagging_loss=0.009074, over 2950724.74 frames. ], batch size: 51, lr: 1.62e-03, grad_scale: 8.0 2023-11-28 01:29:12,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3291246.6666666665, ans=0.05 2023-11-28 01:29:15,884 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 493700 2023-11-28 01:29:26,564 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3291380.0, ans=0.125 2023-11-28 01:29:30,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3291380.0, ans=0.05 2023-11-28 01:29:38,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3291446.6666666665, ans=0.125 2023-11-28 01:29:41,860 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.88 vs. limit=10.0 2023-11-28 01:29:43,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3291446.6666666665, ans=0.125 2023-11-28 01:29:49,682 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 750, loss[loss=0.06595, simple_loss=0.09361, pruned_loss=0.01185, audio_tagging_loss=0.007299, over 16746.00 frames. ], tot_loss[loss=0.06719, simple_loss=0.09131, pruned_loss=0.01253, audio_tagging_loss=0.009007, over 2977701.19 frames. ], batch size: 64, lr: 1.62e-03, grad_scale: 4.0 2023-11-28 01:29:51,657 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.00 vs. limit=15.0 2023-11-28 01:30:06,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3291580.0, ans=0.125 2023-11-28 01:30:13,893 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 493750 2023-11-28 01:30:15,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3291646.6666666665, ans=0.125 2023-11-28 01:30:29,945 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3291713.3333333335, ans=0.0 2023-11-28 01:30:36,023 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.425e+01 8.598e+01 9.375e+01 9.953e+01 1.444e+02, threshold=1.875e+02, percent-clipped=0.0 2023-11-28 01:30:45,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3291780.0, ans=0.125 2023-11-28 01:30:46,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3291846.6666666665, ans=0.125 2023-11-28 01:30:47,085 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 800, loss[loss=0.07542, simple_loss=0.1127, pruned_loss=0.01097, audio_tagging_loss=0.008102, over 15210.00 frames. ], tot_loss[loss=0.06742, simple_loss=0.09156, pruned_loss=0.01261, audio_tagging_loss=0.009035, over 2990952.08 frames. ], batch size: 56, lr: 1.62e-03, grad_scale: 8.0 2023-11-28 01:30:53,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3291846.6666666665, ans=0.1 2023-11-28 01:30:54,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3291846.6666666665, ans=0.2 2023-11-28 01:31:05,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3291913.3333333335, ans=0.125 2023-11-28 01:31:06,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3291913.3333333335, ans=0.125 2023-11-28 01:31:11,499 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 493800 2023-11-28 01:31:16,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3291980.0, ans=0.0 2023-11-28 01:31:16,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3291980.0, ans=0.125 2023-11-28 01:31:27,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3292046.6666666665, ans=0.05 2023-11-28 01:31:32,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3292113.3333333335, ans=0.0 2023-11-28 01:31:39,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3292113.3333333335, ans=0.0 2023-11-28 01:31:42,228 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3292113.3333333335, ans=0.125 2023-11-28 01:31:45,318 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 850, loss[loss=0.05176, simple_loss=0.06245, pruned_loss=0.01119, audio_tagging_loss=0.009349, over 14634.00 frames. ], tot_loss[loss=0.06729, simple_loss=0.09117, pruned_loss=0.0126, audio_tagging_loss=0.009113, over 2999933.25 frames. ], batch size: 56, lr: 1.62e-03, grad_scale: 8.0 2023-11-28 01:31:57,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3292246.6666666665, ans=0.125 2023-11-28 01:31:58,138 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3292246.6666666665, ans=0.2 2023-11-28 01:31:59,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3292246.6666666665, ans=0.0 2023-11-28 01:32:04,692 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3292246.6666666665, ans=0.0 2023-11-28 01:32:09,996 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 493850 2023-11-28 01:32:11,755 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.80 vs. limit=22.5 2023-11-28 01:32:13,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3292313.3333333335, ans=0.04949747468305833 2023-11-28 01:32:23,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3292380.0, ans=0.125 2023-11-28 01:32:24,194 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3292380.0, ans=0.0 2023-11-28 01:32:31,570 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.288e+01 8.796e+01 9.707e+01 1.029e+02 1.774e+02, threshold=1.941e+02, percent-clipped=0.0 2023-11-28 01:32:40,415 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.28 vs. limit=15.0 2023-11-28 01:32:43,118 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 900, loss[loss=0.05825, simple_loss=0.07807, pruned_loss=0.008765, audio_tagging_loss=0.01044, over 15394.00 frames. ], tot_loss[loss=0.06726, simple_loss=0.09093, pruned_loss=0.01261, audio_tagging_loss=0.009186, over 3015894.48 frames. ], batch size: 58, lr: 1.62e-03, grad_scale: 8.0 2023-11-28 01:33:01,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3292580.0, ans=0.125 2023-11-28 01:33:07,804 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 493900 2023-11-28 01:33:10,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3292646.6666666665, ans=0.1 2023-11-28 01:33:18,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3292713.3333333335, ans=0.125 2023-11-28 01:33:41,070 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 950, loss[loss=0.05769, simple_loss=0.08026, pruned_loss=0.009, audio_tagging_loss=0.008563, over 15673.00 frames. ], tot_loss[loss=0.06668, simple_loss=0.09029, pruned_loss=0.01244, audio_tagging_loss=0.009092, over 3027107.30 frames. ], batch size: 62, lr: 1.62e-03, grad_scale: 8.0 2023-11-28 01:33:51,672 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3292913.3333333335, ans=0.2 2023-11-28 01:34:05,474 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 493950 2023-11-28 01:34:06,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3292980.0, ans=0.125 2023-11-28 01:34:08,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3292980.0, ans=0.2 2023-11-28 01:34:16,609 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.20 vs. limit=6.0 2023-11-28 01:34:27,168 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.099e+01 8.712e+01 9.268e+01 9.903e+01 1.259e+02, threshold=1.854e+02, percent-clipped=0.0 2023-11-28 01:34:36,050 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.85 vs. limit=15.0 2023-11-28 01:34:36,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3293113.3333333335, ans=0.1 2023-11-28 01:34:38,904 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 1000, loss[loss=0.06479, simple_loss=0.08495, pruned_loss=0.01417, audio_tagging_loss=0.008138, over 14661.00 frames. ], tot_loss[loss=0.06718, simple_loss=0.09117, pruned_loss=0.01271, audio_tagging_loss=0.008881, over 3030602.53 frames. ], batch size: 58, lr: 1.62e-03, grad_scale: 8.0 2023-11-28 01:34:41,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3293180.0, ans=0.0 2023-11-28 01:35:01,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3293313.3333333335, ans=0.125 2023-11-28 01:35:03,362 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 494000 2023-11-28 01:35:05,797 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 01:35:06,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=3293313.3333333335, ans=15.0 2023-11-28 01:35:15,581 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3293380.0, ans=0.125 2023-11-28 01:35:19,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3293380.0, ans=0.125 2023-11-28 01:35:20,039 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3293380.0, ans=0.125 2023-11-28 01:35:37,063 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 1050, loss[loss=0.05967, simple_loss=0.0839, pruned_loss=0.008409, audio_tagging_loss=0.009311, over 15547.00 frames. ], tot_loss[loss=0.06676, simple_loss=0.09077, pruned_loss=0.01256, audio_tagging_loss=0.008824, over 3040973.46 frames. ], batch size: 59, lr: 1.62e-03, grad_scale: 8.0 2023-11-28 01:35:42,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3293513.3333333335, ans=0.0 2023-11-28 01:35:51,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3293580.0, ans=0.125 2023-11-28 01:35:58,872 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.87 vs. limit=12.0 2023-11-28 01:35:59,705 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3293646.6666666665, ans=0.1 2023-11-28 01:36:01,766 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 494050 2023-11-28 01:36:14,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3293713.3333333335, ans=0.1 2023-11-28 01:36:20,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3293713.3333333335, ans=0.1 2023-11-28 01:36:23,363 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.763e+01 8.510e+01 9.155e+01 1.003e+02 1.223e+02, threshold=1.831e+02, percent-clipped=0.0 2023-11-28 01:36:33,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3293780.0, ans=0.0 2023-11-28 01:36:33,422 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3293780.0, ans=0.125 2023-11-28 01:36:35,327 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 1100, loss[loss=0.0517, simple_loss=0.06623, pruned_loss=0.006863, audio_tagging_loss=0.01172, over 16325.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.09021, pruned_loss=0.01251, audio_tagging_loss=0.008745, over 3038197.30 frames. ], batch size: 62, lr: 1.62e-03, grad_scale: 8.0 2023-11-28 01:36:39,881 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 01:36:40,021 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3293846.6666666665, ans=0.0 2023-11-28 01:36:41,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=3293846.6666666665, ans=10.0 2023-11-28 01:36:56,360 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.52 vs. limit=15.0 2023-11-28 01:36:58,981 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 494100 2023-11-28 01:37:17,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3294046.6666666665, ans=0.125 2023-11-28 01:37:32,950 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 1150, loss[loss=0.07358, simple_loss=0.1022, pruned_loss=0.01477, audio_tagging_loss=0.00772, over 16694.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.08996, pruned_loss=0.01247, audio_tagging_loss=0.008702, over 3041385.65 frames. ], batch size: 63, lr: 1.62e-03, grad_scale: 8.0 2023-11-28 01:37:36,579 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3294180.0, ans=0.1 2023-11-28 01:37:57,727 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 494150 2023-11-28 01:38:18,797 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.809e+01 8.708e+01 9.340e+01 1.012e+02 1.442e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-28 01:38:27,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3294446.6666666665, ans=0.5 2023-11-28 01:38:29,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3294513.3333333335, ans=0.0 2023-11-28 01:38:29,504 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.64 vs. limit=15.0 2023-11-28 01:38:29,978 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 1200, loss[loss=0.0753, simple_loss=0.1128, pruned_loss=0.01401, audio_tagging_loss=0.004874, over 15634.00 frames. ], tot_loss[loss=0.06687, simple_loss=0.09111, pruned_loss=0.01273, audio_tagging_loss=0.008591, over 3042147.88 frames. ], batch size: 56, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:38:32,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3294513.3333333335, ans=0.2 2023-11-28 01:38:51,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3294580.0, ans=0.125 2023-11-28 01:38:54,959 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 494200 2023-11-28 01:39:05,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3294713.3333333335, ans=0.2 2023-11-28 01:39:05,755 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.41 vs. limit=12.0 2023-11-28 01:39:12,299 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.96 vs. limit=15.0 2023-11-28 01:39:29,036 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 1250, loss[loss=0.07621, simple_loss=0.1033, pruned_loss=0.01661, audio_tagging_loss=0.007969, over 14732.00 frames. ], tot_loss[loss=0.06653, simple_loss=0.09074, pruned_loss=0.01267, audio_tagging_loss=0.008491, over 3043390.19 frames. ], batch size: 55, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:39:29,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3294846.6666666665, ans=0.125 2023-11-28 01:39:37,906 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.80 vs. limit=12.0 2023-11-28 01:39:45,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3294913.3333333335, ans=0.1 2023-11-28 01:39:45,765 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.56 vs. limit=22.5 2023-11-28 01:39:52,931 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 494250 2023-11-28 01:39:53,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3294980.0, ans=0.1 2023-11-28 01:40:08,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3295046.6666666665, ans=0.0 2023-11-28 01:40:15,303 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.229e+01 8.552e+01 9.215e+01 9.963e+01 1.305e+02, threshold=1.843e+02, percent-clipped=0.0 2023-11-28 01:40:15,931 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.01 vs. limit=15.0 2023-11-28 01:40:16,701 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3295113.3333333335, ans=0.05 2023-11-28 01:40:16,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3295113.3333333335, ans=0.125 2023-11-28 01:40:26,881 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 1300, loss[loss=0.0765, simple_loss=0.1084, pruned_loss=0.01316, audio_tagging_loss=0.009165, over 15137.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.09088, pruned_loss=0.01261, audio_tagging_loss=0.008511, over 3049142.69 frames. ], batch size: 55, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:40:30,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3295180.0, ans=0.0 2023-11-28 01:40:34,738 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3295180.0, ans=0.2 2023-11-28 01:40:42,892 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.75 vs. limit=12.0 2023-11-28 01:40:50,501 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 494300 2023-11-28 01:41:24,000 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 1350, loss[loss=0.05817, simple_loss=0.08272, pruned_loss=0.007806, audio_tagging_loss=0.009001, over 15558.00 frames. ], tot_loss[loss=0.06667, simple_loss=0.09107, pruned_loss=0.01258, audio_tagging_loss=0.008555, over 3049572.72 frames. ], batch size: 62, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:41:30,788 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3295513.3333333335, ans=0.125 2023-11-28 01:41:49,426 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 494350 2023-11-28 01:41:49,518 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3295646.6666666665, ans=0.125 2023-11-28 01:41:49,610 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3295646.6666666665, ans=0.125 2023-11-28 01:41:49,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3295646.6666666665, ans=0.125 2023-11-28 01:41:52,054 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.83 vs. limit=15.0 2023-11-28 01:42:03,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3295713.3333333335, ans=0.2 2023-11-28 01:42:08,161 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 01:42:10,307 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.051e+01 8.515e+01 9.138e+01 9.769e+01 1.555e+02, threshold=1.828e+02, percent-clipped=0.0 2023-11-28 01:42:10,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3295780.0, ans=0.2 2023-11-28 01:42:22,292 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 1400, loss[loss=0.05104, simple_loss=0.07085, pruned_loss=0.006093, audio_tagging_loss=0.009526, over 14490.00 frames. ], tot_loss[loss=0.06662, simple_loss=0.09103, pruned_loss=0.01253, audio_tagging_loss=0.008576, over 3055864.68 frames. ], batch size: 56, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:42:46,789 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 494400 2023-11-28 01:43:05,979 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.34 vs. limit=12.0 2023-11-28 01:43:20,764 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 1450, loss[loss=0.06488, simple_loss=0.08613, pruned_loss=0.01321, audio_tagging_loss=0.008599, over 15909.00 frames. ], tot_loss[loss=0.0669, simple_loss=0.09112, pruned_loss=0.01269, audio_tagging_loss=0.008653, over 3063448.13 frames. ], batch size: 59, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:43:44,170 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 494450 2023-11-28 01:43:58,905 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.49 vs. limit=6.0 2023-11-28 01:44:06,550 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.683e+01 8.990e+01 9.437e+01 1.012e+02 1.630e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-28 01:44:08,120 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.63 vs. limit=15.0 2023-11-28 01:44:14,358 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3296446.6666666665, ans=0.125 2023-11-28 01:44:17,588 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 1500, loss[loss=0.06565, simple_loss=0.08391, pruned_loss=0.01474, audio_tagging_loss=0.008953, over 14926.00 frames. ], tot_loss[loss=0.06708, simple_loss=0.09109, pruned_loss=0.01275, audio_tagging_loss=0.008776, over 3058837.27 frames. ], batch size: 56, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:44:19,410 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.81 vs. limit=15.0 2023-11-28 01:44:30,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3296580.0, ans=0.125 2023-11-28 01:44:42,291 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 494500 2023-11-28 01:45:00,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3296713.3333333335, ans=0.0 2023-11-28 01:45:04,601 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.48 vs. limit=6.0 2023-11-28 01:45:10,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3296780.0, ans=0.125 2023-11-28 01:45:15,920 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 1550, loss[loss=0.06391, simple_loss=0.08402, pruned_loss=0.01207, audio_tagging_loss=0.009832, over 15052.00 frames. ], tot_loss[loss=0.06739, simple_loss=0.09125, pruned_loss=0.01289, audio_tagging_loss=0.008877, over 3052630.51 frames. ], batch size: 54, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:45:21,727 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3296846.6666666665, ans=0.125 2023-11-28 01:45:35,012 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3296913.3333333335, ans=0.125 2023-11-28 01:45:35,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3296913.3333333335, ans=0.0 2023-11-28 01:45:40,199 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 494550 2023-11-28 01:45:59,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3297046.6666666665, ans=0.2 2023-11-28 01:46:00,039 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3297046.6666666665, ans=0.0 2023-11-28 01:46:01,874 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.818e+01 8.675e+01 9.125e+01 9.756e+01 1.252e+02, threshold=1.825e+02, percent-clipped=0.0 2023-11-28 01:46:06,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3297113.3333333335, ans=0.125 2023-11-28 01:46:13,994 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 1600, loss[loss=0.0725, simple_loss=0.09782, pruned_loss=0.01554, audio_tagging_loss=0.008055, over 15676.00 frames. ], tot_loss[loss=0.06725, simple_loss=0.09125, pruned_loss=0.01272, audio_tagging_loss=0.008903, over 3057533.35 frames. ], batch size: 58, lr: 1.62e-03, grad_scale: 32.0 2023-11-28 01:46:20,788 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 01:46:20,922 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3297180.0, ans=0.125 2023-11-28 01:46:27,797 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.31 vs. limit=15.0 2023-11-28 01:46:29,570 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 01:46:36,418 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.70 vs. limit=6.0 2023-11-28 01:46:37,077 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 494600 2023-11-28 01:46:43,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3297313.3333333335, ans=0.0 2023-11-28 01:47:02,875 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3297446.6666666665, ans=0.125 2023-11-28 01:47:08,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3297446.6666666665, ans=0.0 2023-11-28 01:47:10,411 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 1650, loss[loss=0.06322, simple_loss=0.09231, pruned_loss=0.01059, audio_tagging_loss=0.006477, over 14560.00 frames. ], tot_loss[loss=0.06701, simple_loss=0.09064, pruned_loss=0.01266, audio_tagging_loss=0.009028, over 3048070.39 frames. ], batch size: 54, lr: 1.62e-03, grad_scale: 32.0 2023-11-28 01:47:14,048 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.70 vs. limit=15.0 2023-11-28 01:47:26,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3297580.0, ans=0.035 2023-11-28 01:47:33,887 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.19 vs. limit=15.0 2023-11-28 01:47:34,429 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 494650 2023-11-28 01:47:57,553 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.038e+01 8.801e+01 9.426e+01 1.009e+02 1.226e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-28 01:48:08,366 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 1700, loss[loss=0.08271, simple_loss=0.1093, pruned_loss=0.0169, audio_tagging_loss=0.01114, over 15120.00 frames. ], tot_loss[loss=0.06675, simple_loss=0.09032, pruned_loss=0.01255, audio_tagging_loss=0.009035, over 3042060.16 frames. ], batch size: 57, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:48:08,578 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3297846.6666666665, ans=0.2 2023-11-28 01:48:18,383 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3297913.3333333335, ans=0.125 2023-11-28 01:48:32,332 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 494700 2023-11-28 01:48:32,813 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.00 vs. limit=15.0 2023-11-28 01:49:01,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3298113.3333333335, ans=0.1 2023-11-28 01:49:04,881 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 1750, loss[loss=0.07407, simple_loss=0.1037, pruned_loss=0.01496, audio_tagging_loss=0.007249, over 15326.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.08901, pruned_loss=0.01222, audio_tagging_loss=0.008925, over 3041141.85 frames. ], batch size: 56, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:49:12,788 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3298180.0, ans=0.0 2023-11-28 01:49:15,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn1.whiten.whitening_limit, batch_count=3298180.0, ans=22.5 2023-11-28 01:49:27,435 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.93 vs. limit=15.0 2023-11-28 01:49:29,068 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 494750 2023-11-28 01:49:29,821 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.77 vs. limit=15.0 2023-11-28 01:49:32,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3298313.3333333335, ans=0.125 2023-11-28 01:49:40,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3298380.0, ans=10.0 2023-11-28 01:49:40,248 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3298380.0, ans=0.0 2023-11-28 01:49:44,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3298380.0, ans=0.125 2023-11-28 01:49:52,444 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.412e+01 8.816e+01 9.529e+01 1.029e+02 1.383e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-28 01:49:53,124 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=3298446.6666666665, ans=22.5 2023-11-28 01:49:54,917 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3298446.6666666665, ans=0.1 2023-11-28 01:50:02,963 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 1800, loss[loss=0.09183, simple_loss=0.1266, pruned_loss=0.02149, audio_tagging_loss=0.007052, over 15804.00 frames. ], tot_loss[loss=0.06509, simple_loss=0.0882, pruned_loss=0.01212, audio_tagging_loss=0.008872, over 3034456.90 frames. ], batch size: 57, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:50:09,303 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.58 vs. limit=15.0 2023-11-28 01:50:10,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3298513.3333333335, ans=0.125 2023-11-28 01:50:11,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3298513.3333333335, ans=0.2 2023-11-28 01:50:23,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3298580.0, ans=0.125 2023-11-28 01:50:25,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3298646.6666666665, ans=0.125 2023-11-28 01:50:26,970 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 494800 2023-11-28 01:50:52,369 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.96 vs. limit=10.0 2023-11-28 01:50:54,523 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.71 vs. limit=10.0 2023-11-28 01:50:55,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3298780.0, ans=0.125 2023-11-28 01:50:57,600 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3298780.0, ans=0.2 2023-11-28 01:51:00,591 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 1850, loss[loss=0.08035, simple_loss=0.1135, pruned_loss=0.01334, audio_tagging_loss=0.01024, over 15085.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.08875, pruned_loss=0.01223, audio_tagging_loss=0.008821, over 3037001.84 frames. ], batch size: 55, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:51:04,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3298846.6666666665, ans=0.125 2023-11-28 01:51:07,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3298846.6666666665, ans=0.125 2023-11-28 01:51:22,097 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3298913.3333333335, ans=0.2 2023-11-28 01:51:25,223 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 494850 2023-11-28 01:51:40,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3299046.6666666665, ans=0.125 2023-11-28 01:51:40,291 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3299046.6666666665, ans=0.125 2023-11-28 01:51:42,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3299046.6666666665, ans=0.0 2023-11-28 01:51:48,562 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.331e+01 8.704e+01 9.342e+01 1.015e+02 1.516e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-28 01:51:55,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3299113.3333333335, ans=0.125 2023-11-28 01:51:58,711 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 1900, loss[loss=0.0508, simple_loss=0.07082, pruned_loss=0.005475, audio_tagging_loss=0.009912, over 15874.00 frames. ], tot_loss[loss=0.06493, simple_loss=0.08818, pruned_loss=0.01207, audio_tagging_loss=0.008772, over 3042255.24 frames. ], batch size: 60, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:52:22,947 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 494900 2023-11-28 01:52:56,202 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 1950, loss[loss=0.07017, simple_loss=0.09416, pruned_loss=0.01286, audio_tagging_loss=0.01023, over 15027.00 frames. ], tot_loss[loss=0.06509, simple_loss=0.08839, pruned_loss=0.01214, audio_tagging_loss=0.008749, over 3045691.08 frames. ], batch size: 56, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:52:56,981 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.43 vs. limit=22.5 2023-11-28 01:53:20,609 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 494950 2023-11-28 01:53:28,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3299646.6666666665, ans=0.125 2023-11-28 01:53:36,738 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3299713.3333333335, ans=0.1 2023-11-28 01:53:43,771 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.336e+01 8.727e+01 9.410e+01 1.013e+02 1.318e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-28 01:53:45,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3299780.0, ans=0.125 2023-11-28 01:53:50,943 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.63 vs. limit=22.5 2023-11-28 01:53:53,613 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 2000, loss[loss=0.05741, simple_loss=0.0723, pruned_loss=0.01045, audio_tagging_loss=0.01081, over 15085.00 frames. ], tot_loss[loss=0.06493, simple_loss=0.0884, pruned_loss=0.01203, audio_tagging_loss=0.008696, over 3052499.03 frames. ], batch size: 57, lr: 1.62e-03, grad_scale: 32.0 2023-11-28 01:54:03,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3299846.6666666665, ans=10.0 2023-11-28 01:54:17,869 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 495000 2023-11-28 01:54:22,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3299980.0, ans=0.0 2023-11-28 01:54:38,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3300113.3333333335, ans=0.0 2023-11-28 01:54:46,348 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 01:54:51,452 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 2050, loss[loss=0.06075, simple_loss=0.07806, pruned_loss=0.01219, audio_tagging_loss=0.009533, over 15116.00 frames. ], tot_loss[loss=0.06483, simple_loss=0.08848, pruned_loss=0.01197, audio_tagging_loss=0.008618, over 3046385.95 frames. ], batch size: 57, lr: 1.62e-03, grad_scale: 32.0 2023-11-28 01:54:58,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3300180.0, ans=0.125 2023-11-28 01:55:02,513 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.50 vs. limit=6.0 2023-11-28 01:55:03,537 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.92 vs. limit=15.0 2023-11-28 01:55:15,656 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 495050 2023-11-28 01:55:26,494 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3300380.0, ans=0.2 2023-11-28 01:55:33,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3300380.0, ans=0.1 2023-11-28 01:55:36,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3300446.6666666665, ans=0.0 2023-11-28 01:55:38,626 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.597e+01 8.786e+01 9.334e+01 1.004e+02 1.293e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-28 01:55:49,316 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 2100, loss[loss=0.06609, simple_loss=0.09398, pruned_loss=0.009613, audio_tagging_loss=0.009483, over 14952.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08903, pruned_loss=0.01219, audio_tagging_loss=0.008641, over 3043276.92 frames. ], batch size: 57, lr: 1.62e-03, grad_scale: 32.0 2023-11-28 01:56:12,049 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3300646.6666666665, ans=0.0 2023-11-28 01:56:14,171 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 495100 2023-11-28 01:56:16,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3300646.6666666665, ans=0.0 2023-11-28 01:56:40,113 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3300780.0, ans=0.2 2023-11-28 01:56:41,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3300780.0, ans=0.125 2023-11-28 01:56:47,194 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 2150, loss[loss=0.06288, simple_loss=0.08311, pruned_loss=0.01277, audio_tagging_loss=0.008551, over 15932.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.08936, pruned_loss=0.01236, audio_tagging_loss=0.008652, over 3042048.11 frames. ], batch size: 60, lr: 1.62e-03, grad_scale: 32.0 2023-11-28 01:57:11,879 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 495150 2023-11-28 01:57:18,080 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.43 vs. limit=22.5 2023-11-28 01:57:22,685 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 01:57:23,332 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.13 vs. limit=15.0 2023-11-28 01:57:28,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3301046.6666666665, ans=0.125 2023-11-28 01:57:34,211 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.121e+01 8.726e+01 9.403e+01 1.016e+02 1.279e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-28 01:57:35,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3301113.3333333335, ans=0.2 2023-11-28 01:57:38,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3301113.3333333335, ans=0.0 2023-11-28 01:57:45,116 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 2200, loss[loss=0.05913, simple_loss=0.0856, pruned_loss=0.008853, audio_tagging_loss=0.007479, over 15228.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08917, pruned_loss=0.01236, audio_tagging_loss=0.00864, over 3039886.61 frames. ], batch size: 57, lr: 1.62e-03, grad_scale: 32.0 2023-11-28 01:57:49,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3301180.0, ans=0.125 2023-11-28 01:58:08,879 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 495200 2023-11-28 01:58:12,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3301313.3333333335, ans=0.125 2023-11-28 01:58:18,821 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.09 vs. limit=22.5 2023-11-28 01:58:25,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=3301380.0, ans=15.0 2023-11-28 01:58:31,749 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.69 vs. limit=10.0 2023-11-28 01:58:43,006 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 2250, loss[loss=0.0675, simple_loss=0.1008, pruned_loss=0.01072, audio_tagging_loss=0.006393, over 14890.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.08975, pruned_loss=0.01242, audio_tagging_loss=0.008641, over 3041127.33 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 01:58:54,169 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3301580.0, ans=0.125 2023-11-28 01:59:05,009 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3301646.6666666665, ans=0.07 2023-11-28 01:59:07,459 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 495250 2023-11-28 01:59:29,983 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.518e+01 8.696e+01 9.309e+01 9.993e+01 1.259e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-28 01:59:36,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3301780.0, ans=0.125 2023-11-28 01:59:39,877 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 2300, loss[loss=0.07293, simple_loss=0.09947, pruned_loss=0.01415, audio_tagging_loss=0.00905, over 15537.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.09006, pruned_loss=0.01236, audio_tagging_loss=0.008627, over 3049116.44 frames. ], batch size: 60, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 01:59:43,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3301846.6666666665, ans=0.125 2023-11-28 01:59:52,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3301913.3333333335, ans=0.0 2023-11-28 01:59:57,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3301913.3333333335, ans=0.1 2023-11-28 02:00:04,563 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 495300 2023-11-28 02:00:15,260 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3302046.6666666665, ans=0.1 2023-11-28 02:00:16,282 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3302046.6666666665, ans=0.125 2023-11-28 02:00:21,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3302046.6666666665, ans=0.0 2023-11-28 02:00:22,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3302046.6666666665, ans=0.0 2023-11-28 02:00:32,691 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 02:00:37,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3302180.0, ans=0.2 2023-11-28 02:00:38,608 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 2350, loss[loss=0.09011, simple_loss=0.1154, pruned_loss=0.02461, audio_tagging_loss=0.007806, over 15298.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.09013, pruned_loss=0.01229, audio_tagging_loss=0.008696, over 3047634.98 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 02:00:46,676 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3302180.0, ans=0.125 2023-11-28 02:01:02,419 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 495350 2023-11-28 02:01:02,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3302313.3333333335, ans=0.125 2023-11-28 02:01:09,057 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 02:01:16,479 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 02:01:21,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3302380.0, ans=0.2 2023-11-28 02:01:25,421 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.786e+01 8.943e+01 9.346e+01 1.021e+02 1.230e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-28 02:01:26,126 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.89 vs. limit=10.0 2023-11-28 02:01:26,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3302446.6666666665, ans=0.125 2023-11-28 02:01:28,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3302446.6666666665, ans=0.125 2023-11-28 02:01:36,078 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 2400, loss[loss=0.08428, simple_loss=0.1171, pruned_loss=0.01811, audio_tagging_loss=0.007625, over 14138.00 frames. ], tot_loss[loss=0.06667, simple_loss=0.09084, pruned_loss=0.01254, audio_tagging_loss=0.008709, over 3047534.65 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 02:01:52,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3302580.0, ans=0.125 2023-11-28 02:01:58,684 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.97 vs. limit=6.0 2023-11-28 02:01:59,826 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 495400 2023-11-28 02:02:18,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3302713.3333333335, ans=0.0 2023-11-28 02:02:21,228 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3302780.0, ans=0.125 2023-11-28 02:02:21,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3302780.0, ans=0.1 2023-11-28 02:02:25,595 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3302780.0, ans=0.125 2023-11-28 02:02:32,981 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 2450, loss[loss=0.06719, simple_loss=0.08143, pruned_loss=0.01639, audio_tagging_loss=0.01009, over 15330.00 frames. ], tot_loss[loss=0.06678, simple_loss=0.09072, pruned_loss=0.01258, audio_tagging_loss=0.008837, over 3044506.37 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 02:02:36,531 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 02:02:40,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3302846.6666666665, ans=0.1 2023-11-28 02:02:49,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3302913.3333333335, ans=0.125 2023-11-28 02:02:57,445 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 495450 2023-11-28 02:03:17,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=3303046.6666666665, ans=15.0 2023-11-28 02:03:21,014 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.220e+01 8.760e+01 9.295e+01 9.959e+01 1.249e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-28 02:03:31,360 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 2500, loss[loss=0.07069, simple_loss=0.09773, pruned_loss=0.01288, audio_tagging_loss=0.008952, over 15362.00 frames. ], tot_loss[loss=0.06707, simple_loss=0.0911, pruned_loss=0.01262, audio_tagging_loss=0.008898, over 3044843.12 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:03:34,901 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3303180.0, ans=0.2 2023-11-28 02:03:42,606 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3303246.6666666665, ans=0.0 2023-11-28 02:03:54,999 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 495500 2023-11-28 02:03:55,185 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3303313.3333333335, ans=0.125 2023-11-28 02:04:08,412 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 02:04:26,499 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3303446.6666666665, ans=0.125 2023-11-28 02:04:28,465 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 2550, loss[loss=0.0661, simple_loss=0.08804, pruned_loss=0.01271, audio_tagging_loss=0.009372, over 14798.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.09047, pruned_loss=0.01259, audio_tagging_loss=0.008768, over 3038786.15 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:04:31,583 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3303513.3333333335, ans=0.1 2023-11-28 02:04:36,376 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.21 vs. limit=12.0 2023-11-28 02:04:52,418 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 495550 2023-11-28 02:05:17,229 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.254e+01 8.589e+01 9.196e+01 9.860e+01 1.420e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-28 02:05:21,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=3303780.0, ans=15.0 2023-11-28 02:05:26,159 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 2600, loss[loss=0.04873, simple_loss=0.06066, pruned_loss=0.007472, audio_tagging_loss=0.01092, over 15542.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08878, pruned_loss=0.01239, audio_tagging_loss=0.008731, over 3036241.50 frames. ], batch size: 60, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:05:49,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3303980.0, ans=0.0 2023-11-28 02:05:50,788 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 495600 2023-11-28 02:05:55,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3303980.0, ans=0.0 2023-11-28 02:05:55,810 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 02:06:01,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3304046.6666666665, ans=0.0 2023-11-28 02:06:04,051 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 02:06:12,154 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.24 vs. limit=5.0 2023-11-28 02:06:13,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3304113.3333333335, ans=0.0 2023-11-28 02:06:18,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3304113.3333333335, ans=0.125 2023-11-28 02:06:24,277 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 2650, loss[loss=0.05963, simple_loss=0.07557, pruned_loss=0.01389, audio_tagging_loss=0.007949, over 15765.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08873, pruned_loss=0.01224, audio_tagging_loss=0.008686, over 3035981.74 frames. ], batch size: 60, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:06:35,252 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.51 vs. limit=6.0 2023-11-28 02:06:42,536 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.41 vs. limit=15.0 2023-11-28 02:06:43,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3304246.6666666665, ans=0.0 2023-11-28 02:06:48,593 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 495650 2023-11-28 02:06:48,848 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3304313.3333333335, ans=0.1 2023-11-28 02:06:50,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3304313.3333333335, ans=0.125 2023-11-28 02:06:55,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3304313.3333333335, ans=0.1 2023-11-28 02:07:12,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3304446.6666666665, ans=0.1 2023-11-28 02:07:13,637 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.405e+01 8.568e+01 9.215e+01 1.005e+02 1.316e+02, threshold=1.843e+02, percent-clipped=0.0 2023-11-28 02:07:21,995 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 2700, loss[loss=0.04569, simple_loss=0.05728, pruned_loss=0.008697, audio_tagging_loss=0.008348, over 15061.00 frames. ], tot_loss[loss=0.06495, simple_loss=0.08833, pruned_loss=0.01215, audio_tagging_loss=0.008635, over 3044977.04 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 02:07:28,353 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3304513.3333333335, ans=0.0 2023-11-28 02:07:40,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3304580.0, ans=0.1 2023-11-28 02:07:45,797 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 495700 2023-11-28 02:07:57,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3304713.3333333335, ans=0.1 2023-11-28 02:07:57,866 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3304713.3333333335, ans=0.07 2023-11-28 02:08:15,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3304780.0, ans=0.125 2023-11-28 02:08:19,872 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 2750, loss[loss=0.04534, simple_loss=0.05474, pruned_loss=0.007333, audio_tagging_loss=0.01064, over 16250.00 frames. ], tot_loss[loss=0.06455, simple_loss=0.0874, pruned_loss=0.01213, audio_tagging_loss=0.00873, over 3045258.07 frames. ], batch size: 63, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 02:08:20,718 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.55 vs. limit=15.0 2023-11-28 02:08:22,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3304846.6666666665, ans=0.0 2023-11-28 02:08:23,473 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3304846.6666666665, ans=0.125 2023-11-28 02:08:36,434 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3304913.3333333335, ans=0.2 2023-11-28 02:08:43,939 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 495750 2023-11-28 02:08:49,201 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3304980.0, ans=0.2 2023-11-28 02:09:09,417 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.187e+01 8.832e+01 9.441e+01 1.011e+02 1.289e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-28 02:09:10,577 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 02:09:15,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3305113.3333333335, ans=0.125 2023-11-28 02:09:17,089 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 2800, loss[loss=0.04921, simple_loss=0.0586, pruned_loss=0.008788, audio_tagging_loss=0.01112, over 13318.00 frames. ], tot_loss[loss=0.06495, simple_loss=0.0882, pruned_loss=0.01219, audio_tagging_loss=0.008659, over 3038360.42 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:09:29,213 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3305246.6666666665, ans=0.125 2023-11-28 02:09:42,286 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 495800 2023-11-28 02:09:43,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3305313.3333333335, ans=0.125 2023-11-28 02:10:00,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3305380.0, ans=0.125 2023-11-28 02:10:08,517 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3305446.6666666665, ans=0.0 2023-11-28 02:10:10,173 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.72 vs. limit=6.0 2023-11-28 02:10:15,080 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 2850, loss[loss=0.07339, simple_loss=0.1007, pruned_loss=0.01335, audio_tagging_loss=0.009693, over 15921.00 frames. ], tot_loss[loss=0.06492, simple_loss=0.08823, pruned_loss=0.01215, audio_tagging_loss=0.008662, over 3045958.42 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:10:15,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3305513.3333333335, ans=0.125 2023-11-28 02:10:27,886 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.12 vs. limit=15.0 2023-11-28 02:10:39,411 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 495850 2023-11-28 02:11:04,935 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.432e+01 8.806e+01 9.389e+01 1.005e+02 1.417e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-28 02:11:12,658 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 2900, loss[loss=0.08119, simple_loss=0.1154, pruned_loss=0.01526, audio_tagging_loss=0.008236, over 14197.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.08963, pruned_loss=0.0124, audio_tagging_loss=0.008545, over 3041129.72 frames. ], batch size: 53, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:11:29,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3305913.3333333335, ans=0.0 2023-11-28 02:11:32,585 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3305913.3333333335, ans=10.0 2023-11-28 02:11:36,744 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 495900 2023-11-28 02:11:56,074 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 02:12:03,473 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3306113.3333333335, ans=0.125 2023-11-28 02:12:09,959 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 2950, loss[loss=0.05396, simple_loss=0.07739, pruned_loss=0.008797, audio_tagging_loss=0.006468, over 16680.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.08982, pruned_loss=0.01242, audio_tagging_loss=0.008549, over 3045250.67 frames. ], batch size: 63, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:12:13,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3306180.0, ans=0.125 2023-11-28 02:12:14,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3306180.0, ans=0.1 2023-11-28 02:12:14,327 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.48 vs. limit=15.0 2023-11-28 02:12:34,940 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 495950 2023-11-28 02:12:55,250 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.71 vs. limit=6.0 2023-11-28 02:13:01,048 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.879e+01 8.902e+01 9.555e+01 1.025e+02 1.277e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-28 02:13:01,396 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 02:13:07,700 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 3000, loss[loss=0.08143, simple_loss=0.1069, pruned_loss=0.01825, audio_tagging_loss=0.009718, over 15312.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.08973, pruned_loss=0.01254, audio_tagging_loss=0.00874, over 3045201.38 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 02:13:07,701 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-28 02:13:40,562 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([6.0354, 5.9774, 5.9587, 5.7560], device='cuda:1') 2023-11-28 02:13:42,048 INFO [train_asr.py:1267] (1/4) Epoch 42, validation: loss=0.05767, simple_loss=0.05061, pruned_loss=0.005183, audio_tagging_loss=0.02719, over 4681554.00 frames. 2023-11-28 02:13:42,049 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-28 02:13:48,705 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3306513.3333333335, ans=0.2 2023-11-28 02:13:56,434 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.82 vs. limit=15.0 2023-11-28 02:13:56,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3306580.0, ans=0.035 2023-11-28 02:14:05,676 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 496000 2023-11-28 02:14:17,961 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.33 vs. limit=15.0 2023-11-28 02:14:34,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=3306780.0, ans=15.0 2023-11-28 02:14:42,114 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 3050, loss[loss=0.04192, simple_loss=0.05128, pruned_loss=0.004216, audio_tagging_loss=0.01207, over 14687.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.09001, pruned_loss=0.01254, audio_tagging_loss=0.008797, over 3051267.56 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 02:14:43,907 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.43 vs. limit=22.5 2023-11-28 02:15:01,714 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.19 vs. limit=12.0 2023-11-28 02:15:04,275 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.79 vs. limit=6.0 2023-11-28 02:15:05,682 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 496050 2023-11-28 02:15:16,141 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 02:15:16,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3307046.6666666665, ans=0.125 2023-11-28 02:15:18,522 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3307046.6666666665, ans=0.0 2023-11-28 02:15:32,503 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.076e+01 8.885e+01 9.431e+01 1.019e+02 1.276e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-28 02:15:34,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3307113.3333333335, ans=0.125 2023-11-28 02:15:39,159 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 3100, loss[loss=0.07635, simple_loss=0.1025, pruned_loss=0.01629, audio_tagging_loss=0.008837, over 16071.00 frames. ], tot_loss[loss=0.06742, simple_loss=0.09175, pruned_loss=0.01279, audio_tagging_loss=0.008748, over 3049378.08 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 02:15:39,898 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.53 vs. limit=12.0 2023-11-28 02:16:00,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3307246.6666666665, ans=0.125 2023-11-28 02:16:00,627 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.93 vs. limit=10.0 2023-11-28 02:16:03,479 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 496100 2023-11-28 02:16:13,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3307380.0, ans=0.125 2023-11-28 02:16:22,018 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3307380.0, ans=0.125 2023-11-28 02:16:33,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3307446.6666666665, ans=0.1 2023-11-28 02:16:36,640 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 3150, loss[loss=0.08863, simple_loss=0.1222, pruned_loss=0.01839, audio_tagging_loss=0.009155, over 15780.00 frames. ], tot_loss[loss=0.06682, simple_loss=0.09069, pruned_loss=0.01255, audio_tagging_loss=0.00893, over 3051721.05 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 02:16:48,653 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.34 vs. limit=15.0 2023-11-28 02:17:01,138 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 496150 2023-11-28 02:17:13,993 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3307713.3333333335, ans=0.0 2023-11-28 02:17:27,307 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.839e+01 8.803e+01 9.436e+01 1.005e+02 1.293e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-28 02:17:29,838 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3307780.0, ans=0.125 2023-11-28 02:17:33,936 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 3200, loss[loss=0.06811, simple_loss=0.09483, pruned_loss=0.01293, audio_tagging_loss=0.007763, over 16506.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.09069, pruned_loss=0.01241, audio_tagging_loss=0.008942, over 3049651.51 frames. ], batch size: 62, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:17:47,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3307913.3333333335, ans=0.0 2023-11-28 02:17:58,657 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 496200 2023-11-28 02:18:01,344 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3307980.0, ans=0.125 2023-11-28 02:18:09,101 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3308046.6666666665, ans=0.125 2023-11-28 02:18:23,582 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3308113.3333333335, ans=0.0 2023-11-28 02:18:28,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3308113.3333333335, ans=0.125 2023-11-28 02:18:29,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3308113.3333333335, ans=0.0 2023-11-28 02:18:32,212 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 3250, loss[loss=0.06378, simple_loss=0.08969, pruned_loss=0.01185, audio_tagging_loss=0.007081, over 14796.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.09032, pruned_loss=0.01239, audio_tagging_loss=0.009002, over 3050483.41 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:18:51,018 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.56 vs. limit=15.0 2023-11-28 02:18:56,687 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 496250 2023-11-28 02:19:01,482 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.96 vs. limit=15.0 2023-11-28 02:19:03,541 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3308313.3333333335, ans=0.2 2023-11-28 02:19:23,286 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.705e+01 8.753e+01 9.382e+01 9.909e+01 1.200e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-28 02:19:25,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3308446.6666666665, ans=0.2 2023-11-28 02:19:29,809 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 3300, loss[loss=0.07143, simple_loss=0.09507, pruned_loss=0.01484, audio_tagging_loss=0.00906, over 15776.00 frames. ], tot_loss[loss=0.06648, simple_loss=0.09033, pruned_loss=0.01236, audio_tagging_loss=0.008955, over 3055104.17 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:19:54,725 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 496300 2023-11-28 02:20:28,045 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 3350, loss[loss=0.06587, simple_loss=0.09049, pruned_loss=0.01141, audio_tagging_loss=0.009219, over 15318.00 frames. ], tot_loss[loss=0.06648, simple_loss=0.09061, pruned_loss=0.01228, audio_tagging_loss=0.008889, over 3060087.07 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:20:52,534 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 496350 2023-11-28 02:21:04,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3309046.6666666665, ans=0.0 2023-11-28 02:21:19,212 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.593e+01 8.880e+01 9.434e+01 1.005e+02 1.295e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-28 02:21:25,799 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 3400, loss[loss=0.05787, simple_loss=0.07639, pruned_loss=0.007629, audio_tagging_loss=0.01205, over 15669.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.09008, pruned_loss=0.01219, audio_tagging_loss=0.008741, over 3057715.46 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:21:32,562 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3309180.0, ans=0.95 2023-11-28 02:21:41,241 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.54 vs. limit=15.0 2023-11-28 02:21:49,512 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 496400 2023-11-28 02:22:20,944 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.37 vs. limit=10.0 2023-11-28 02:22:23,523 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 3450, loss[loss=0.05954, simple_loss=0.08882, pruned_loss=0.008861, audio_tagging_loss=0.006269, over 14803.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.0901, pruned_loss=0.01222, audio_tagging_loss=0.008578, over 3054958.07 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:22:29,382 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3309513.3333333335, ans=0.1 2023-11-28 02:22:40,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3309580.0, ans=0.125 2023-11-28 02:22:48,165 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 496450 2023-11-28 02:23:13,827 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.471e+01 8.800e+01 9.317e+01 1.017e+02 1.307e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-28 02:23:20,400 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 3500, loss[loss=0.04913, simple_loss=0.06706, pruned_loss=0.006791, audio_tagging_loss=0.008814, over 16789.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.0904, pruned_loss=0.01237, audio_tagging_loss=0.008558, over 3052472.49 frames. ], batch size: 65, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:23:20,731 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3309846.6666666665, ans=0.125 2023-11-28 02:23:45,615 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 496500 2023-11-28 02:23:52,042 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 02:24:09,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3310113.3333333335, ans=0.2 2023-11-28 02:24:13,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3310113.3333333335, ans=0.2 2023-11-28 02:24:15,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3310113.3333333335, ans=0.1 2023-11-28 02:24:18,479 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 3550, loss[loss=0.06824, simple_loss=0.08643, pruned_loss=0.01684, audio_tagging_loss=0.008181, over 14956.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.08959, pruned_loss=0.01231, audio_tagging_loss=0.008631, over 3052599.40 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:24:18,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3310180.0, ans=0.0 2023-11-28 02:24:19,906 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3310180.0, ans=0.125 2023-11-28 02:24:31,483 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 02:24:42,243 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 496550 2023-11-28 02:25:08,570 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.495e+01 8.721e+01 9.297e+01 1.018e+02 1.196e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-28 02:25:15,739 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 3600, loss[loss=0.05792, simple_loss=0.07732, pruned_loss=0.01196, audio_tagging_loss=0.007296, over 14264.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.08999, pruned_loss=0.0125, audio_tagging_loss=0.008621, over 3051803.44 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 02:25:29,186 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3310580.0, ans=0.09899494936611666 2023-11-28 02:25:39,225 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 496600 2023-11-28 02:25:41,563 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.47 vs. limit=22.5 2023-11-28 02:25:51,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3310713.3333333335, ans=0.2 2023-11-28 02:26:10,679 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.54 vs. limit=12.0 2023-11-28 02:26:12,306 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 3650, loss[loss=0.05776, simple_loss=0.07263, pruned_loss=0.01298, audio_tagging_loss=0.008457, over 15838.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.09028, pruned_loss=0.01241, audio_tagging_loss=0.008594, over 3044959.54 frames. ], batch size: 62, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:26:25,838 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3310913.3333333335, ans=0.1 2023-11-28 02:26:31,529 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.02 vs. limit=12.0 2023-11-28 02:26:36,646 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 496650 2023-11-28 02:26:47,504 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3311046.6666666665, ans=0.0 2023-11-28 02:26:48,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3311046.6666666665, ans=0.1 2023-11-28 02:27:03,704 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.057e+01 8.594e+01 9.332e+01 1.003e+02 1.328e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-28 02:27:09,744 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 3700, loss[loss=0.0816, simple_loss=0.1073, pruned_loss=0.01877, audio_tagging_loss=0.00919, over 16999.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.08891, pruned_loss=0.01228, audio_tagging_loss=0.008687, over 3045475.05 frames. ], batch size: 61, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:27:24,361 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3311246.6666666665, ans=0.125 2023-11-28 02:27:27,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3311246.6666666665, ans=0.125 2023-11-28 02:27:34,040 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 496700 2023-11-28 02:27:36,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3311313.3333333335, ans=0.125 2023-11-28 02:27:42,028 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3311313.3333333335, ans=0.0 2023-11-28 02:28:07,640 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 3750, loss[loss=0.06431, simple_loss=0.08957, pruned_loss=0.01063, audio_tagging_loss=0.00889, over 15040.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08915, pruned_loss=0.0123, audio_tagging_loss=0.008712, over 3053719.93 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:28:15,724 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3311513.3333333335, ans=0.125 2023-11-28 02:28:20,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3311580.0, ans=0.125 2023-11-28 02:28:30,792 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 496750 2023-11-28 02:28:42,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3311713.3333333335, ans=0.2 2023-11-28 02:28:47,251 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 02:28:57,882 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3311780.0, ans=0.125 2023-11-28 02:28:58,669 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.525e+01 8.781e+01 9.240e+01 9.958e+01 1.596e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-28 02:29:01,738 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.09 vs. limit=15.0 2023-11-28 02:29:04,354 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 3800, loss[loss=0.07034, simple_loss=0.1056, pruned_loss=0.01099, audio_tagging_loss=0.006549, over 15196.00 frames. ], tot_loss[loss=0.06646, simple_loss=0.09015, pruned_loss=0.01261, audio_tagging_loss=0.008766, over 3054906.38 frames. ], batch size: 53, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:29:06,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3311846.6666666665, ans=0.0 2023-11-28 02:29:28,453 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.90 vs. limit=22.5 2023-11-28 02:29:28,747 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 496800 2023-11-28 02:29:28,868 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3311980.0, ans=0.0 2023-11-28 02:29:36,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3311980.0, ans=0.125 2023-11-28 02:29:44,054 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.08 vs. limit=15.0 2023-11-28 02:30:00,815 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3312180.0, ans=0.125 2023-11-28 02:30:01,645 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 3850, loss[loss=0.04391, simple_loss=0.05898, pruned_loss=0.00574, audio_tagging_loss=0.008679, over 16572.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.08944, pruned_loss=0.01245, audio_tagging_loss=0.008914, over 3052601.29 frames. ], batch size: 65, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:30:05,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3312180.0, ans=0.0 2023-11-28 02:30:23,033 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.36 vs. limit=22.5 2023-11-28 02:30:23,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3312313.3333333335, ans=0.2 2023-11-28 02:30:26,030 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 496850 2023-11-28 02:30:27,855 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=11.43 vs. limit=15.0 2023-11-28 02:30:40,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3312380.0, ans=0.125 2023-11-28 02:30:41,899 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.79 vs. limit=6.0 2023-11-28 02:30:53,142 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.776e+01 8.894e+01 9.500e+01 1.019e+02 1.780e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-28 02:30:54,476 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3312446.6666666665, ans=0.125 2023-11-28 02:30:55,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3312446.6666666665, ans=0.125 2023-11-28 02:30:59,264 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 3900, loss[loss=0.07194, simple_loss=0.09353, pruned_loss=0.01699, audio_tagging_loss=0.008186, over 15267.00 frames. ], tot_loss[loss=0.06595, simple_loss=0.08944, pruned_loss=0.01239, audio_tagging_loss=0.008844, over 3042223.63 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:31:22,915 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 496900 2023-11-28 02:31:26,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3312646.6666666665, ans=0.07 2023-11-28 02:31:49,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3312780.0, ans=0.1 2023-11-28 02:31:56,457 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 3950, loss[loss=0.08189, simple_loss=0.115, pruned_loss=0.01558, audio_tagging_loss=0.008809, over 15595.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08894, pruned_loss=0.01221, audio_tagging_loss=0.008989, over 3031204.07 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:31:58,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3312846.6666666665, ans=0.1 2023-11-28 02:31:59,898 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3312846.6666666665, ans=0.1 2023-11-28 02:32:03,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3312846.6666666665, ans=0.1 2023-11-28 02:32:06,582 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3312913.3333333335, ans=0.125 2023-11-28 02:32:15,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3312913.3333333335, ans=0.09899494936611666 2023-11-28 02:32:19,814 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 496950 2023-11-28 02:32:26,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3312980.0, ans=0.125 2023-11-28 02:32:26,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3312980.0, ans=0.0 2023-11-28 02:32:27,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3312980.0, ans=0.125 2023-11-28 02:32:34,393 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3313046.6666666665, ans=0.0 2023-11-28 02:32:46,855 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.773e+01 8.841e+01 9.484e+01 1.039e+02 1.407e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-28 02:32:47,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3313113.3333333335, ans=0.2 2023-11-28 02:32:52,897 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 4000, loss[loss=0.05146, simple_loss=0.06133, pruned_loss=0.009028, audio_tagging_loss=0.01177, over 14757.00 frames. ], tot_loss[loss=0.06522, simple_loss=0.08833, pruned_loss=0.012, audio_tagging_loss=0.009046, over 3038103.19 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 02:33:02,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3313246.6666666665, ans=0.0 2023-11-28 02:33:17,221 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 497000 2023-11-28 02:33:30,314 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.26 vs. limit=10.0 2023-11-28 02:33:49,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3313513.3333333335, ans=0.2 2023-11-28 02:33:49,966 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 4050, loss[loss=0.06062, simple_loss=0.09201, pruned_loss=0.006566, audio_tagging_loss=0.008042, over 15607.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08904, pruned_loss=0.01196, audio_tagging_loss=0.008973, over 3041362.03 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:33:52,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3313513.3333333335, ans=0.1 2023-11-28 02:33:53,203 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 02:34:14,334 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 497050 2023-11-28 02:34:14,582 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3313646.6666666665, ans=0.0 2023-11-28 02:34:25,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3313713.3333333335, ans=0.1 2023-11-28 02:34:31,906 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=3313713.3333333335, ans=15.0 2023-11-28 02:34:42,831 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.897e+01 8.798e+01 9.309e+01 1.006e+02 1.878e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-28 02:34:47,163 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 4100, loss[loss=0.06695, simple_loss=0.09075, pruned_loss=0.01312, audio_tagging_loss=0.008459, over 14152.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.08943, pruned_loss=0.01199, audio_tagging_loss=0.008924, over 3040449.80 frames. ], batch size: 53, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:34:47,730 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.61 vs. limit=22.5 2023-11-28 02:34:58,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3313913.3333333335, ans=0.125 2023-11-28 02:35:10,940 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 497100 2023-11-28 02:35:27,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3314046.6666666665, ans=0.2 2023-11-28 02:35:29,260 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.02 vs. limit=15.0 2023-11-28 02:35:29,322 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.66 vs. limit=22.5 2023-11-28 02:35:33,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3314113.3333333335, ans=0.5 2023-11-28 02:35:44,010 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 4150, loss[loss=0.06897, simple_loss=0.09822, pruned_loss=0.01188, audio_tagging_loss=0.00798, over 15394.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.08985, pruned_loss=0.01214, audio_tagging_loss=0.00876, over 3048460.99 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:35:53,450 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.68 vs. limit=22.5 2023-11-28 02:36:08,946 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 497150 2023-11-28 02:36:14,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3314313.3333333335, ans=0.0 2023-11-28 02:36:16,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3314313.3333333335, ans=0.2 2023-11-28 02:36:20,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3314380.0, ans=0.2 2023-11-28 02:36:21,814 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3314380.0, ans=0.0 2023-11-28 02:36:27,055 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 02:36:28,434 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3314380.0, ans=0.125 2023-11-28 02:36:37,310 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.681e+01 8.734e+01 9.392e+01 9.837e+01 1.224e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-28 02:36:41,688 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 4200, loss[loss=0.06966, simple_loss=0.09393, pruned_loss=0.01239, audio_tagging_loss=0.0103, over 15216.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.09004, pruned_loss=0.0122, audio_tagging_loss=0.008633, over 3043450.63 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:36:46,635 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.63 vs. limit=15.0 2023-11-28 02:36:46,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=3314513.3333333335, ans=15.0 2023-11-28 02:36:48,465 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 02:36:57,754 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.09 vs. limit=15.0 2023-11-28 02:37:06,198 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 497200 2023-11-28 02:37:09,952 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3314646.6666666665, ans=0.0 2023-11-28 02:37:10,094 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 02:37:13,442 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3314646.6666666665, ans=0.1 2023-11-28 02:37:40,157 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 4250, loss[loss=0.07873, simple_loss=0.1063, pruned_loss=0.01714, audio_tagging_loss=0.008436, over 14810.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.09048, pruned_loss=0.01244, audio_tagging_loss=0.008659, over 3039047.30 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:37:41,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3314846.6666666665, ans=0.2 2023-11-28 02:37:52,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3314913.3333333335, ans=0.1 2023-11-28 02:38:04,135 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 497250 2023-11-28 02:38:04,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3314980.0, ans=0.125 2023-11-28 02:38:21,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3315046.6666666665, ans=0.0 2023-11-28 02:38:23,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3315046.6666666665, ans=0.0 2023-11-28 02:38:23,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3315046.6666666665, ans=0.125 2023-11-28 02:38:32,542 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.487e+01 8.722e+01 9.477e+01 1.017e+02 1.335e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-28 02:38:36,937 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 4300, loss[loss=0.07971, simple_loss=0.09517, pruned_loss=0.01983, audio_tagging_loss=0.01229, over 15177.00 frames. ], tot_loss[loss=0.0663, simple_loss=0.09025, pruned_loss=0.01251, audio_tagging_loss=0.008667, over 3046965.53 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:38:49,610 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.69 vs. limit=15.0 2023-11-28 02:38:50,280 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 02:38:57,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3315246.6666666665, ans=0.2 2023-11-28 02:38:59,302 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.29 vs. limit=22.5 2023-11-28 02:39:01,082 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 497300 2023-11-28 02:39:14,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3315380.0, ans=0.0 2023-11-28 02:39:19,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3315380.0, ans=0.125 2023-11-28 02:39:22,901 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3315446.6666666665, ans=0.1 2023-11-28 02:39:23,028 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.54 vs. limit=15.0 2023-11-28 02:39:27,499 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3315446.6666666665, ans=0.125 2023-11-28 02:39:33,973 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 4350, loss[loss=0.07084, simple_loss=0.09691, pruned_loss=0.01311, audio_tagging_loss=0.009271, over 14392.00 frames. ], tot_loss[loss=0.0673, simple_loss=0.09159, pruned_loss=0.01289, audio_tagging_loss=0.008615, over 3050358.66 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:39:38,745 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3315513.3333333335, ans=0.0 2023-11-28 02:39:58,407 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 497350 2023-11-28 02:40:00,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3315646.6666666665, ans=0.125 2023-11-28 02:40:05,435 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.92 vs. limit=15.0 2023-11-28 02:40:17,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3315713.3333333335, ans=0.125 2023-11-28 02:40:21,599 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.60 vs. limit=22.5 2023-11-28 02:40:23,705 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.69 vs. limit=15.0 2023-11-28 02:40:26,195 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.672e+01 8.957e+01 9.552e+01 1.043e+02 1.269e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-28 02:40:31,056 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 4400, loss[loss=0.06556, simple_loss=0.08799, pruned_loss=0.01345, audio_tagging_loss=0.008117, over 15306.00 frames. ], tot_loss[loss=0.06688, simple_loss=0.09097, pruned_loss=0.01282, audio_tagging_loss=0.008572, over 3050110.45 frames. ], batch size: 60, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 02:40:32,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3315846.6666666665, ans=0.035 2023-11-28 02:40:55,877 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 497400 2023-11-28 02:40:56,290 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.32 vs. limit=15.0 2023-11-28 02:41:23,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3316113.3333333335, ans=0.125 2023-11-28 02:41:29,273 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 4450, loss[loss=0.0857, simple_loss=0.128, pruned_loss=0.01829, audio_tagging_loss=0.003388, over 14476.00 frames. ], tot_loss[loss=0.06664, simple_loss=0.09082, pruned_loss=0.01267, audio_tagging_loss=0.008561, over 3050460.72 frames. ], batch size: 53, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 02:41:29,635 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3316180.0, ans=0.2 2023-11-28 02:41:38,090 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3316180.0, ans=0.0 2023-11-28 02:41:47,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3316246.6666666665, ans=0.125 2023-11-28 02:41:47,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3316246.6666666665, ans=0.125 2023-11-28 02:41:53,487 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 497450 2023-11-28 02:42:04,404 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.87 vs. limit=15.0 2023-11-28 02:42:08,607 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.89 vs. limit=15.0 2023-11-28 02:42:22,842 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.499e+01 9.078e+01 9.731e+01 1.036e+02 1.394e+02, threshold=1.946e+02, percent-clipped=0.0 2023-11-28 02:42:27,231 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 4500, loss[loss=0.05563, simple_loss=0.08157, pruned_loss=0.008529, audio_tagging_loss=0.006321, over 16105.00 frames. ], tot_loss[loss=0.06687, simple_loss=0.09107, pruned_loss=0.01269, audio_tagging_loss=0.008652, over 3053475.20 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 02:42:43,877 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.65 vs. limit=22.5 2023-11-28 02:42:50,803 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 497500 2023-11-28 02:42:58,290 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.39 vs. limit=12.0 2023-11-28 02:43:11,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3316713.3333333335, ans=0.125 2023-11-28 02:43:17,617 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 02:43:24,681 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 4550, loss[loss=0.07249, simple_loss=0.1018, pruned_loss=0.01207, audio_tagging_loss=0.009545, over 15393.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.08977, pruned_loss=0.01245, audio_tagging_loss=0.008705, over 3045598.57 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:43:24,952 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3316846.6666666665, ans=0.125 2023-11-28 02:43:33,609 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3316846.6666666665, ans=0.05 2023-11-28 02:43:49,320 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 497550 2023-11-28 02:43:59,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3317046.6666666665, ans=0.2 2023-11-28 02:44:00,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3317046.6666666665, ans=0.5 2023-11-28 02:44:09,510 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 02:44:18,116 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.427e+01 8.674e+01 9.170e+01 9.991e+01 1.281e+02, threshold=1.834e+02, percent-clipped=0.0 2023-11-28 02:44:21,531 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 4600, loss[loss=0.0658, simple_loss=0.07987, pruned_loss=0.01425, audio_tagging_loss=0.01162, over 14291.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08897, pruned_loss=0.0123, audio_tagging_loss=0.008758, over 3044193.71 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:44:46,324 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 497600 2023-11-28 02:45:02,959 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.49 vs. limit=15.0 2023-11-28 02:45:09,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3317446.6666666665, ans=0.125 2023-11-28 02:45:18,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3317446.6666666665, ans=0.125 2023-11-28 02:45:20,474 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 4650, loss[loss=0.06189, simple_loss=0.09141, pruned_loss=0.008775, audio_tagging_loss=0.007412, over 15529.00 frames. ], tot_loss[loss=0.06595, simple_loss=0.08961, pruned_loss=0.01236, audio_tagging_loss=0.008786, over 3046030.12 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:45:44,334 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 497650 2023-11-28 02:46:14,345 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.731e+01 8.773e+01 9.249e+01 1.003e+02 1.204e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-28 02:46:17,629 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 4700, loss[loss=0.04539, simple_loss=0.05766, pruned_loss=0.005274, audio_tagging_loss=0.01129, over 14910.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.08939, pruned_loss=0.01241, audio_tagging_loss=0.008858, over 3045739.90 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:46:19,442 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3317846.6666666665, ans=0.0 2023-11-28 02:46:20,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3317846.6666666665, ans=0.07 2023-11-28 02:46:26,356 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.06 vs. limit=22.5 2023-11-28 02:46:35,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3317913.3333333335, ans=0.125 2023-11-28 02:46:40,988 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3317980.0, ans=0.0 2023-11-28 02:46:42,395 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 497700 2023-11-28 02:46:48,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3317980.0, ans=0.1 2023-11-28 02:46:52,929 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3318046.6666666665, ans=0.04949747468305833 2023-11-28 02:47:02,288 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.05 vs. limit=15.0 2023-11-28 02:47:08,709 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3318113.3333333335, ans=0.125 2023-11-28 02:47:14,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3318180.0, ans=0.0 2023-11-28 02:47:14,947 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 4750, loss[loss=0.07758, simple_loss=0.1003, pruned_loss=0.01853, audio_tagging_loss=0.008879, over 14591.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.08977, pruned_loss=0.01241, audio_tagging_loss=0.008878, over 3044854.86 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:47:37,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3318313.3333333335, ans=0.125 2023-11-28 02:47:39,375 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 497750 2023-11-28 02:47:50,375 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.52 vs. limit=10.0 2023-11-28 02:47:55,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3318380.0, ans=0.07 2023-11-28 02:48:08,875 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.623e+01 8.846e+01 9.343e+01 1.002e+02 1.233e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-28 02:48:13,300 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 4800, loss[loss=0.05817, simple_loss=0.0749, pruned_loss=0.01131, audio_tagging_loss=0.00941, over 13581.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.0899, pruned_loss=0.01254, audio_tagging_loss=0.008927, over 3043538.46 frames. ], batch size: 53, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 02:48:13,520 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3318513.3333333335, ans=0.1 2023-11-28 02:48:27,164 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.40 vs. limit=12.0 2023-11-28 02:48:30,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3318580.0, ans=0.125 2023-11-28 02:48:35,261 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3318646.6666666665, ans=0.1 2023-11-28 02:48:37,236 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 497800 2023-11-28 02:48:47,609 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3318713.3333333335, ans=0.0 2023-11-28 02:48:58,620 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 02:48:59,011 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.93 vs. limit=10.0 2023-11-28 02:49:10,419 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 4850, loss[loss=0.05151, simple_loss=0.07115, pruned_loss=0.008394, audio_tagging_loss=0.007545, over 14805.00 frames. ], tot_loss[loss=0.06644, simple_loss=0.08991, pruned_loss=0.01248, audio_tagging_loss=0.009001, over 3040787.22 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 02:49:22,105 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 02:49:34,170 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 497850 2023-11-28 02:49:50,189 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 02:49:51,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3319046.6666666665, ans=0.07 2023-11-28 02:50:01,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3319113.3333333335, ans=0.125 2023-11-28 02:50:05,856 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.086e+01 8.681e+01 9.347e+01 1.000e+02 1.245e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-28 02:50:07,200 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3319180.0, ans=0.125 2023-11-28 02:50:08,187 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 4900, loss[loss=0.07008, simple_loss=0.08919, pruned_loss=0.014, audio_tagging_loss=0.01148, over 16581.00 frames. ], tot_loss[loss=0.0667, simple_loss=0.09025, pruned_loss=0.01261, audio_tagging_loss=0.008967, over 3041365.12 frames. ], batch size: 62, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:50:16,550 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.43 vs. limit=15.0 2023-11-28 02:50:18,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3319246.6666666665, ans=0.5 2023-11-28 02:50:33,046 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 497900 2023-11-28 02:50:33,695 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.82 vs. limit=22.5 2023-11-28 02:50:54,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3319446.6666666665, ans=0.125 2023-11-28 02:51:05,925 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 4950, loss[loss=0.06, simple_loss=0.08041, pruned_loss=0.01194, audio_tagging_loss=0.007854, over 14873.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.09001, pruned_loss=0.01232, audio_tagging_loss=0.008775, over 3043373.96 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:51:19,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3319580.0, ans=0.125 2023-11-28 02:51:21,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3319580.0, ans=0.125 2023-11-28 02:51:31,008 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 497950 2023-11-28 02:51:42,346 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3319713.3333333335, ans=0.1 2023-11-28 02:52:01,893 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.340e+01 8.571e+01 9.206e+01 9.727e+01 1.276e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-28 02:52:04,062 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 5000, loss[loss=0.06216, simple_loss=0.08273, pruned_loss=0.01163, audio_tagging_loss=0.009168, over 16609.00 frames. ], tot_loss[loss=0.06649, simple_loss=0.09058, pruned_loss=0.01255, audio_tagging_loss=0.008649, over 3049460.73 frames. ], batch size: 62, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:52:27,569 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 498000 2023-11-28 02:52:37,105 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.82 vs. limit=15.0 2023-11-28 02:52:41,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3320046.6666666665, ans=0.125 2023-11-28 02:52:49,549 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.43 vs. limit=22.5 2023-11-28 02:53:01,680 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 5050, loss[loss=0.07602, simple_loss=0.1001, pruned_loss=0.01815, audio_tagging_loss=0.007831, over 14867.00 frames. ], tot_loss[loss=0.06663, simple_loss=0.09092, pruned_loss=0.01256, audio_tagging_loss=0.008613, over 3053132.30 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:53:02,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3320180.0, ans=0.2 2023-11-28 02:53:08,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3320180.0, ans=0.0 2023-11-28 02:53:17,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3320246.6666666665, ans=0.2 2023-11-28 02:53:25,481 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 498050 2023-11-28 02:53:26,786 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 02:53:31,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3320313.3333333335, ans=0.0 2023-11-28 02:53:41,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3320380.0, ans=0.125 2023-11-28 02:53:53,246 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3320446.6666666665, ans=0.125 2023-11-28 02:53:56,347 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.913e+01 8.785e+01 9.412e+01 9.952e+01 1.191e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-28 02:53:58,594 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 5100, loss[loss=0.07148, simple_loss=0.09961, pruned_loss=0.01248, audio_tagging_loss=0.009193, over 16947.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.09048, pruned_loss=0.01258, audio_tagging_loss=0.008586, over 3051815.32 frames. ], batch size: 61, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:54:17,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3320580.0, ans=0.125 2023-11-28 02:54:23,951 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 498100 2023-11-28 02:54:31,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3320646.6666666665, ans=0.125 2023-11-28 02:54:56,893 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 5150, loss[loss=0.06093, simple_loss=0.07612, pruned_loss=0.01171, audio_tagging_loss=0.01116, over 15626.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.08984, pruned_loss=0.01252, audio_tagging_loss=0.008626, over 3047919.51 frames. ], batch size: 61, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:55:01,360 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.98 vs. limit=15.0 2023-11-28 02:55:12,635 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3320913.3333333335, ans=0.0 2023-11-28 02:55:13,609 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3320913.3333333335, ans=0.0 2023-11-28 02:55:14,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3320913.3333333335, ans=0.125 2023-11-28 02:55:16,222 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.50 vs. limit=22.5 2023-11-28 02:55:21,141 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 498150 2023-11-28 02:55:53,312 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.730e+01 8.740e+01 9.410e+01 1.002e+02 1.466e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-28 02:55:54,469 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 5200, loss[loss=0.05486, simple_loss=0.08242, pruned_loss=0.007873, audio_tagging_loss=0.005773, over 15377.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.09062, pruned_loss=0.01275, audio_tagging_loss=0.008543, over 3047719.58 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:56:18,558 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 498200 2023-11-28 02:56:25,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3321313.3333333335, ans=0.1 2023-11-28 02:56:34,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3321380.0, ans=0.2 2023-11-28 02:56:38,143 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.91 vs. limit=10.0 2023-11-28 02:56:51,394 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.27 vs. limit=15.0 2023-11-28 02:56:51,818 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 5250, loss[loss=0.07524, simple_loss=0.1095, pruned_loss=0.01329, audio_tagging_loss=0.007231, over 16180.00 frames. ], tot_loss[loss=0.06735, simple_loss=0.09164, pruned_loss=0.0131, audio_tagging_loss=0.008433, over 3044126.70 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:57:00,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3321513.3333333335, ans=0.0 2023-11-28 02:57:02,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3321580.0, ans=0.1 2023-11-28 02:57:08,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3321580.0, ans=0.125 2023-11-28 02:57:11,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3321580.0, ans=0.125 2023-11-28 02:57:12,538 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3321580.0, ans=0.2 2023-11-28 02:57:16,785 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 498250 2023-11-28 02:57:18,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3321646.6666666665, ans=0.125 2023-11-28 02:57:22,843 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.73 vs. limit=6.0 2023-11-28 02:57:44,526 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.46 vs. limit=15.0 2023-11-28 02:57:48,327 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.650e+01 8.904e+01 9.487e+01 1.032e+02 1.355e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-28 02:57:49,440 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 5300, loss[loss=0.09363, simple_loss=0.1336, pruned_loss=0.01929, audio_tagging_loss=0.007553, over 15019.00 frames. ], tot_loss[loss=0.06759, simple_loss=0.0922, pruned_loss=0.01302, audio_tagging_loss=0.008474, over 3043429.83 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:57:49,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3321846.6666666665, ans=0.2 2023-11-28 02:58:04,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=3321913.3333333335, ans=15.0 2023-11-28 02:58:09,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3321913.3333333335, ans=0.1 2023-11-28 02:58:09,369 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 02:58:13,574 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 498300 2023-11-28 02:58:34,226 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3322113.3333333335, ans=0.125 2023-11-28 02:58:35,218 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3322113.3333333335, ans=0.0 2023-11-28 02:58:47,140 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 5350, loss[loss=0.0621, simple_loss=0.08903, pruned_loss=0.009925, audio_tagging_loss=0.007663, over 15325.00 frames. ], tot_loss[loss=0.06757, simple_loss=0.09214, pruned_loss=0.01301, audio_tagging_loss=0.008492, over 3047731.56 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:59:02,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3322246.6666666665, ans=0.0 2023-11-28 02:59:11,094 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 498350 2023-11-28 02:59:16,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3322313.3333333335, ans=0.125 2023-11-28 02:59:42,878 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.439e+01 8.664e+01 9.180e+01 9.721e+01 1.287e+02, threshold=1.836e+02, percent-clipped=0.0 2023-11-28 02:59:44,020 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 5400, loss[loss=0.07184, simple_loss=0.1059, pruned_loss=0.01202, audio_tagging_loss=0.006842, over 14808.00 frames. ], tot_loss[loss=0.06829, simple_loss=0.0931, pruned_loss=0.01321, audio_tagging_loss=0.008531, over 3047516.32 frames. ], batch size: 52, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:59:58,351 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3322580.0, ans=0.125 2023-11-28 03:00:01,708 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3322580.0, ans=0.0 2023-11-28 03:00:08,052 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 498400 2023-11-28 03:00:12,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3322646.6666666665, ans=0.125 2023-11-28 03:00:33,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3322780.0, ans=0.125 2023-11-28 03:00:42,008 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 5450, loss[loss=0.06827, simple_loss=0.09501, pruned_loss=0.01261, audio_tagging_loss=0.008148, over 15195.00 frames. ], tot_loss[loss=0.06786, simple_loss=0.09237, pruned_loss=0.01305, audio_tagging_loss=0.008622, over 3041872.18 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:00:42,200 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3322846.6666666665, ans=0.125 2023-11-28 03:00:43,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3322846.6666666665, ans=0.125 2023-11-28 03:00:47,719 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3322846.6666666665, ans=0.07 2023-11-28 03:00:54,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3322913.3333333335, ans=0.125 2023-11-28 03:01:06,767 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 498450 2023-11-28 03:01:09,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3322980.0, ans=0.125 2023-11-28 03:01:20,726 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3323046.6666666665, ans=0.5 2023-11-28 03:01:24,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3323046.6666666665, ans=0.0 2023-11-28 03:01:38,406 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.263e+01 8.881e+01 9.599e+01 1.024e+02 1.269e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-28 03:01:39,530 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 5500, loss[loss=0.05752, simple_loss=0.07403, pruned_loss=0.01009, audio_tagging_loss=0.01042, over 15718.00 frames. ], tot_loss[loss=0.06765, simple_loss=0.09197, pruned_loss=0.01297, audio_tagging_loss=0.00869, over 3049453.35 frames. ], batch size: 63, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:01:39,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3323180.0, ans=0.1 2023-11-28 03:01:53,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3323246.6666666665, ans=0.1 2023-11-28 03:02:04,127 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 498500 2023-11-28 03:02:08,144 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 03:02:25,468 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.33 vs. limit=15.0 2023-11-28 03:02:29,848 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3323446.6666666665, ans=0.0 2023-11-28 03:02:37,299 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 5550, loss[loss=0.06577, simple_loss=0.08709, pruned_loss=0.01091, audio_tagging_loss=0.01131, over 15604.00 frames. ], tot_loss[loss=0.06784, simple_loss=0.09221, pruned_loss=0.01294, audio_tagging_loss=0.008798, over 3049437.99 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:02:50,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3323580.0, ans=0.0 2023-11-28 03:02:54,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3323580.0, ans=0.0 2023-11-28 03:02:59,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3323646.6666666665, ans=0.2 2023-11-28 03:03:00,235 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3323646.6666666665, ans=0.125 2023-11-28 03:03:01,172 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 498550 2023-11-28 03:03:02,953 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.09 vs. limit=6.0 2023-11-28 03:03:03,583 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3323646.6666666665, ans=0.125 2023-11-28 03:03:09,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3323646.6666666665, ans=0.125 2023-11-28 03:03:12,442 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.69 vs. limit=22.5 2023-11-28 03:03:28,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3323780.0, ans=0.0 2023-11-28 03:03:33,944 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.494e+01 8.559e+01 9.219e+01 9.829e+01 1.565e+02, threshold=1.844e+02, percent-clipped=0.0 2023-11-28 03:03:35,071 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 5600, loss[loss=0.06828, simple_loss=0.09334, pruned_loss=0.01185, audio_tagging_loss=0.009759, over 16354.00 frames. ], tot_loss[loss=0.06663, simple_loss=0.09042, pruned_loss=0.0125, audio_tagging_loss=0.008925, over 3046422.43 frames. ], batch size: 63, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 03:03:44,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3323846.6666666665, ans=0.125 2023-11-28 03:03:59,255 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 498600 2023-11-28 03:03:59,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3323980.0, ans=0.1 2023-11-28 03:04:17,633 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 03:04:18,931 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3324046.6666666665, ans=0.0 2023-11-28 03:04:31,809 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 5650, loss[loss=0.08426, simple_loss=0.114, pruned_loss=0.01958, audio_tagging_loss=0.007699, over 14112.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.08981, pruned_loss=0.01232, audio_tagging_loss=0.008874, over 3051045.77 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 03:04:42,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3324246.6666666665, ans=0.125 2023-11-28 03:04:55,887 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 498650 2023-11-28 03:05:03,576 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3324313.3333333335, ans=0.0 2023-11-28 03:05:04,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3324313.3333333335, ans=0.125 2023-11-28 03:05:06,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3324380.0, ans=0.0 2023-11-28 03:05:22,950 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3324446.6666666665, ans=0.125 2023-11-28 03:05:23,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3324446.6666666665, ans=0.0 2023-11-28 03:05:28,666 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.256e+01 8.782e+01 9.473e+01 1.042e+02 1.222e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-28 03:05:29,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=3324513.3333333335, ans=10.0 2023-11-28 03:05:29,877 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 5700, loss[loss=0.06033, simple_loss=0.08794, pruned_loss=0.008903, audio_tagging_loss=0.007452, over 15018.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.08882, pruned_loss=0.01222, audio_tagging_loss=0.008939, over 3047144.89 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 03:05:37,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3324513.3333333335, ans=0.125 2023-11-28 03:05:53,891 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 498700 2023-11-28 03:05:57,399 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3324646.6666666665, ans=0.07 2023-11-28 03:06:19,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3324780.0, ans=0.125 2023-11-28 03:06:27,543 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 5750, loss[loss=0.09159, simple_loss=0.1275, pruned_loss=0.02216, audio_tagging_loss=0.005695, over 15457.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.089, pruned_loss=0.01222, audio_tagging_loss=0.00881, over 3049340.26 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 03:06:45,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3324913.3333333335, ans=0.125 2023-11-28 03:06:50,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3324980.0, ans=0.125 2023-11-28 03:06:51,006 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 498750 2023-11-28 03:07:15,879 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.64 vs. limit=15.0 2023-11-28 03:07:22,757 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.349e+01 8.740e+01 9.291e+01 9.936e+01 1.231e+02, threshold=1.858e+02, percent-clipped=0.0 2023-11-28 03:07:23,848 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 5800, loss[loss=0.07398, simple_loss=0.09875, pruned_loss=0.01634, audio_tagging_loss=0.008263, over 16236.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.08836, pruned_loss=0.01234, audio_tagging_loss=0.008749, over 3043894.29 frames. ], batch size: 63, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 03:07:24,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3325180.0, ans=0.1 2023-11-28 03:07:37,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3325246.6666666665, ans=0.1 2023-11-28 03:07:39,106 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.31 vs. limit=12.0 2023-11-28 03:07:42,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3325246.6666666665, ans=0.0 2023-11-28 03:07:48,073 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 498800 2023-11-28 03:07:53,737 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.28 vs. limit=15.0 2023-11-28 03:08:08,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3325380.0, ans=0.125 2023-11-28 03:08:15,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3325446.6666666665, ans=0.1 2023-11-28 03:08:21,654 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 5850, loss[loss=0.07878, simple_loss=0.1143, pruned_loss=0.01404, audio_tagging_loss=0.007583, over 14262.00 frames. ], tot_loss[loss=0.065, simple_loss=0.08786, pruned_loss=0.01228, audio_tagging_loss=0.008793, over 3040994.77 frames. ], batch size: 53, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 03:08:29,085 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3325513.3333333335, ans=0.125 2023-11-28 03:08:46,180 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 498850 2023-11-28 03:08:46,399 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-28 03:08:51,080 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.85 vs. limit=6.0 2023-11-28 03:08:55,486 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.06 vs. limit=15.0 2023-11-28 03:09:08,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3325780.0, ans=0.125 2023-11-28 03:09:13,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3325780.0, ans=0.125 2023-11-28 03:09:14,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3325780.0, ans=0.2 2023-11-28 03:09:15,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3325780.0, ans=0.125 2023-11-28 03:09:18,076 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.216e+01 8.725e+01 9.320e+01 1.016e+02 1.515e+02, threshold=1.864e+02, percent-clipped=0.0 2023-11-28 03:09:19,660 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 5900, loss[loss=0.06848, simple_loss=0.09045, pruned_loss=0.01454, audio_tagging_loss=0.008719, over 15446.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08862, pruned_loss=0.01227, audio_tagging_loss=0.008743, over 3043888.56 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 03:09:21,509 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.72 vs. limit=15.0 2023-11-28 03:09:25,189 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.84 vs. limit=15.0 2023-11-28 03:09:29,473 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.49 vs. limit=10.0 2023-11-28 03:09:43,949 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 498900 2023-11-28 03:09:48,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3325980.0, ans=0.125 2023-11-28 03:10:07,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3326113.3333333335, ans=0.125 2023-11-28 03:10:08,606 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3326113.3333333335, ans=0.125 2023-11-28 03:10:09,008 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.71 vs. limit=15.0 2023-11-28 03:10:17,157 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 5950, loss[loss=0.07219, simple_loss=0.1086, pruned_loss=0.0133, audio_tagging_loss=0.004573, over 15532.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.09056, pruned_loss=0.01245, audio_tagging_loss=0.008539, over 3052790.20 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:10:32,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3326246.6666666665, ans=0.1 2023-11-28 03:10:35,942 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.50 vs. limit=15.0 2023-11-28 03:10:40,931 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 498950 2023-11-28 03:10:57,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3326380.0, ans=0.125 2023-11-28 03:11:03,437 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3326446.6666666665, ans=0.1 2023-11-28 03:11:08,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3326446.6666666665, ans=0.1 2023-11-28 03:11:14,370 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.391e+01 8.682e+01 9.363e+01 1.001e+02 1.313e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-28 03:11:14,396 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 6000, loss[loss=0.04057, simple_loss=0.05559, pruned_loss=0.004977, audio_tagging_loss=0.007801, over 14902.00 frames. ], tot_loss[loss=0.06684, simple_loss=0.09118, pruned_loss=0.01272, audio_tagging_loss=0.008526, over 3053897.89 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 03:11:14,397 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-28 03:11:49,934 INFO [train_asr.py:1267] (1/4) Epoch 42, validation: loss=0.05789, simple_loss=0.05056, pruned_loss=0.005172, audio_tagging_loss=0.02743, over 4681554.00 frames. 2023-11-28 03:11:49,935 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-28 03:12:06,488 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3326580.0, ans=0.1 2023-11-28 03:12:13,507 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 499000 2023-11-28 03:12:23,996 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.56 vs. limit=10.0 2023-11-28 03:12:25,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3326713.3333333335, ans=0.1 2023-11-28 03:12:32,251 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 03:12:46,916 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 6050, loss[loss=0.07458, simple_loss=0.106, pruned_loss=0.01262, audio_tagging_loss=0.008972, over 16053.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.09094, pruned_loss=0.0127, audio_tagging_loss=0.008439, over 3052354.78 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 03:12:48,278 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3326846.6666666665, ans=0.125 2023-11-28 03:13:10,395 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 499050 2023-11-28 03:13:19,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3326980.0, ans=0.125 2023-11-28 03:13:19,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3326980.0, ans=0.1 2023-11-28 03:13:27,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3327046.6666666665, ans=0.1 2023-11-28 03:13:33,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3327113.3333333335, ans=0.125 2023-11-28 03:13:37,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3327113.3333333335, ans=0.125 2023-11-28 03:13:44,241 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.384e+01 8.818e+01 9.290e+01 9.982e+01 1.282e+02, threshold=1.858e+02, percent-clipped=0.0 2023-11-28 03:13:44,267 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 6100, loss[loss=0.06894, simple_loss=0.09466, pruned_loss=0.01413, audio_tagging_loss=0.007474, over 15201.00 frames. ], tot_loss[loss=0.06681, simple_loss=0.09112, pruned_loss=0.01272, audio_tagging_loss=0.008536, over 3050184.80 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 03:14:08,813 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 499100 2023-11-28 03:14:13,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3327313.3333333335, ans=0.125 2023-11-28 03:14:16,672 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3327313.3333333335, ans=0.1 2023-11-28 03:14:20,977 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.09 vs. limit=15.0 2023-11-28 03:14:30,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3327446.6666666665, ans=0.0 2023-11-28 03:14:41,475 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 6150, loss[loss=0.05336, simple_loss=0.07166, pruned_loss=0.006664, audio_tagging_loss=0.01087, over 14679.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.09031, pruned_loss=0.01265, audio_tagging_loss=0.008695, over 3046323.22 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:14:49,642 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.80 vs. limit=15.0 2023-11-28 03:15:06,130 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 499150 2023-11-28 03:15:10,964 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.18 vs. limit=15.0 2023-11-28 03:15:13,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3327646.6666666665, ans=0.125 2023-11-28 03:15:31,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3327780.0, ans=0.125 2023-11-28 03:15:39,215 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 6200, loss[loss=0.08554, simple_loss=0.1255, pruned_loss=0.01714, audio_tagging_loss=0.005671, over 15010.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.09038, pruned_loss=0.01259, audio_tagging_loss=0.008649, over 3047608.70 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:15:40,272 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.163e+01 8.658e+01 9.318e+01 1.003e+02 1.390e+02, threshold=1.864e+02, percent-clipped=0.0 2023-11-28 03:15:44,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3327846.6666666665, ans=0.125 2023-11-28 03:15:47,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3327846.6666666665, ans=0.125 2023-11-28 03:15:49,018 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 03:15:56,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3327913.3333333335, ans=0.125 2023-11-28 03:15:57,806 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3327913.3333333335, ans=0.125 2023-11-28 03:16:02,969 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 499200 2023-11-28 03:16:36,594 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 6250, loss[loss=0.03867, simple_loss=0.0533, pruned_loss=0.004044, audio_tagging_loss=0.007971, over 16327.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.09038, pruned_loss=0.01259, audio_tagging_loss=0.008742, over 3050287.85 frames. ], batch size: 61, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:17:00,549 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 499250 2023-11-28 03:17:05,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3328313.3333333335, ans=0.0 2023-11-28 03:17:06,948 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.94 vs. limit=12.0 2023-11-28 03:17:14,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3328380.0, ans=0.125 2023-11-28 03:17:22,894 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.06 vs. limit=15.0 2023-11-28 03:17:33,311 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 6300, loss[loss=0.07065, simple_loss=0.09231, pruned_loss=0.01311, audio_tagging_loss=0.01138, over 15859.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.08951, pruned_loss=0.01253, audio_tagging_loss=0.008856, over 3052430.04 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:17:34,341 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.443e+01 9.160e+01 9.772e+01 1.060e+02 1.350e+02, threshold=1.954e+02, percent-clipped=0.0 2023-11-28 03:17:36,473 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.13 vs. limit=15.0 2023-11-28 03:17:38,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3328513.3333333335, ans=0.035 2023-11-28 03:17:45,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3328580.0, ans=0.125 2023-11-28 03:17:53,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3328580.0, ans=0.125 2023-11-28 03:17:58,614 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 499300 2023-11-28 03:18:05,204 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3328646.6666666665, ans=0.1 2023-11-28 03:18:16,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3328713.3333333335, ans=0.07 2023-11-28 03:18:29,092 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3328780.0, ans=0.0 2023-11-28 03:18:31,047 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 6350, loss[loss=0.04272, simple_loss=0.05134, pruned_loss=0.00501, audio_tagging_loss=0.01204, over 14256.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.08948, pruned_loss=0.01253, audio_tagging_loss=0.008911, over 3049048.38 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:18:53,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3328980.0, ans=0.0 2023-11-28 03:18:55,220 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 499350 2023-11-28 03:19:08,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3329046.6666666665, ans=0.125 2023-11-28 03:19:08,646 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3329046.6666666665, ans=0.125 2023-11-28 03:19:29,097 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 6400, loss[loss=0.07041, simple_loss=0.09302, pruned_loss=0.0129, audio_tagging_loss=0.011, over 15673.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.08927, pruned_loss=0.01258, audio_tagging_loss=0.009112, over 3041516.57 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 03:19:30,177 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.681e+01 8.920e+01 9.509e+01 1.018e+02 1.569e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-28 03:19:52,948 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 499400 2023-11-28 03:20:12,049 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3329380.0, ans=0.1 2023-11-28 03:20:12,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3329380.0, ans=0.5 2023-11-28 03:20:20,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3329446.6666666665, ans=0.0 2023-11-28 03:20:26,015 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 6450, loss[loss=0.05558, simple_loss=0.0746, pruned_loss=0.006874, audio_tagging_loss=0.0114, over 15807.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.08851, pruned_loss=0.01222, audio_tagging_loss=0.009137, over 3044681.04 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 03:20:39,724 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=3329580.0, ans=0.05 2023-11-28 03:20:49,892 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 499450 2023-11-28 03:20:54,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3329646.6666666665, ans=0.125 2023-11-28 03:21:00,598 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 03:21:23,070 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 6500, loss[loss=0.09211, simple_loss=0.121, pruned_loss=0.02495, audio_tagging_loss=0.006639, over 15232.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.08882, pruned_loss=0.01217, audio_tagging_loss=0.009068, over 3045722.51 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:21:25,264 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.167e+01 8.737e+01 9.352e+01 9.973e+01 1.217e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-28 03:21:44,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3329913.3333333335, ans=0.0 2023-11-28 03:21:47,141 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 499500 2023-11-28 03:21:59,411 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.05 vs. limit=12.0 2023-11-28 03:22:07,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3330113.3333333335, ans=0.125 2023-11-28 03:22:10,074 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 03:22:13,933 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.59 vs. limit=15.0 2023-11-28 03:22:20,278 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 6550, loss[loss=0.06382, simple_loss=0.09064, pruned_loss=0.01048, audio_tagging_loss=0.008018, over 15627.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.089, pruned_loss=0.01224, audio_tagging_loss=0.008926, over 3045247.25 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:22:22,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3330180.0, ans=0.125 2023-11-28 03:22:33,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3330246.6666666665, ans=0.07 2023-11-28 03:22:42,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3330313.3333333335, ans=0.125 2023-11-28 03:22:44,238 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 499550 2023-11-28 03:22:50,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3330313.3333333335, ans=0.0 2023-11-28 03:23:04,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3330446.6666666665, ans=0.0 2023-11-28 03:23:08,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3330446.6666666665, ans=0.125 2023-11-28 03:23:09,113 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3330446.6666666665, ans=0.035 2023-11-28 03:23:16,568 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 6600, loss[loss=0.05525, simple_loss=0.07688, pruned_loss=0.008668, audio_tagging_loss=0.008142, over 15511.00 frames. ], tot_loss[loss=0.065, simple_loss=0.08842, pruned_loss=0.01203, audio_tagging_loss=0.008762, over 3041169.02 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 03:23:19,845 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.105e+01 8.958e+01 9.376e+01 9.845e+01 1.305e+02, threshold=1.875e+02, percent-clipped=0.0 2023-11-28 03:23:31,029 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3330580.0, ans=0.2 2023-11-28 03:23:32,376 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.08 vs. limit=10.0 2023-11-28 03:23:40,515 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 499600 2023-11-28 03:23:40,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3330646.6666666665, ans=0.95 2023-11-28 03:23:46,259 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.85 vs. limit=10.0 2023-11-28 03:23:58,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3330713.3333333335, ans=0.1 2023-11-28 03:24:07,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3330780.0, ans=0.125 2023-11-28 03:24:14,476 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 6650, loss[loss=0.06301, simple_loss=0.09263, pruned_loss=0.007619, audio_tagging_loss=0.009074, over 14669.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.08879, pruned_loss=0.01211, audio_tagging_loss=0.008746, over 3038701.53 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 03:24:37,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3330980.0, ans=0.2 2023-11-28 03:24:38,530 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 499650 2023-11-28 03:24:53,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3331046.6666666665, ans=0.125 2023-11-28 03:25:05,888 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3331113.3333333335, ans=0.0 2023-11-28 03:25:10,989 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 6700, loss[loss=0.0689, simple_loss=0.09165, pruned_loss=0.01367, audio_tagging_loss=0.009413, over 14720.00 frames. ], tot_loss[loss=0.0652, simple_loss=0.08913, pruned_loss=0.01201, audio_tagging_loss=0.008634, over 3043410.85 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 03:25:14,826 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.118e+01 8.626e+01 9.557e+01 1.018e+02 1.449e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-28 03:25:20,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3331180.0, ans=0.0 2023-11-28 03:25:36,071 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 499700 2023-11-28 03:25:38,611 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.16 vs. limit=22.5 2023-11-28 03:25:40,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3331313.3333333335, ans=0.05 2023-11-28 03:25:42,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3331313.3333333335, ans=0.2 2023-11-28 03:25:57,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3331446.6666666665, ans=0.125 2023-11-28 03:26:08,894 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 6750, loss[loss=0.07113, simple_loss=0.09624, pruned_loss=0.01374, audio_tagging_loss=0.009263, over 15144.00 frames. ], tot_loss[loss=0.06472, simple_loss=0.08826, pruned_loss=0.0119, audio_tagging_loss=0.008687, over 3032219.51 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 03:26:09,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3331513.3333333335, ans=0.1 2023-11-28 03:26:30,789 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3331646.6666666665, ans=0.0 2023-11-28 03:26:32,878 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 499750 2023-11-28 03:26:32,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3331646.6666666665, ans=0.2 2023-11-28 03:26:53,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3331780.0, ans=0.125 2023-11-28 03:27:06,687 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 6800, loss[loss=0.05386, simple_loss=0.07328, pruned_loss=0.007814, audio_tagging_loss=0.009408, over 15918.00 frames. ], tot_loss[loss=0.06478, simple_loss=0.08885, pruned_loss=0.01179, audio_tagging_loss=0.008567, over 3037614.02 frames. ], batch size: 63, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:27:10,005 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.180e+01 8.683e+01 9.159e+01 9.907e+01 1.833e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-28 03:27:30,313 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 499800 2023-11-28 03:27:35,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3331980.0, ans=0.0 2023-11-28 03:27:41,645 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.92 vs. limit=15.0 2023-11-28 03:27:42,970 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.45 vs. limit=8.0 2023-11-28 03:28:03,818 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 6850, loss[loss=0.08651, simple_loss=0.1287, pruned_loss=0.0172, audio_tagging_loss=0.004953, over 16052.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08937, pruned_loss=0.012, audio_tagging_loss=0.0085, over 3035910.16 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 03:28:16,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3332246.6666666665, ans=0.2 2023-11-28 03:28:28,034 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 499850 2023-11-28 03:28:29,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3332313.3333333335, ans=0.0 2023-11-28 03:28:54,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3332446.6666666665, ans=0.125 2023-11-28 03:28:55,051 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3332446.6666666665, ans=0.125 2023-11-28 03:29:01,370 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 6900, loss[loss=0.06507, simple_loss=0.08925, pruned_loss=0.01049, audio_tagging_loss=0.009946, over 14363.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.09007, pruned_loss=0.01229, audio_tagging_loss=0.008474, over 3028103.27 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 03:29:07,516 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.414e+01 8.595e+01 9.072e+01 9.849e+01 1.232e+02, threshold=1.814e+02, percent-clipped=0.0 2023-11-28 03:29:07,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3332513.3333333335, ans=0.125 2023-11-28 03:29:10,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=3332513.3333333335, ans=0.1 2023-11-28 03:29:18,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3332580.0, ans=0.125 2023-11-28 03:29:22,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3332580.0, ans=0.2 2023-11-28 03:29:25,812 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 499900 2023-11-28 03:29:47,274 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 03:29:59,741 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 6950, loss[loss=0.06456, simple_loss=0.08914, pruned_loss=0.008177, audio_tagging_loss=0.01181, over 15565.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.09027, pruned_loss=0.01221, audio_tagging_loss=0.008428, over 3028864.89 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 03:30:23,213 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 499950 2023-11-28 03:30:28,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3332980.0, ans=0.125 2023-11-28 03:30:34,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3333046.6666666665, ans=0.2 2023-11-28 03:30:49,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3333113.3333333335, ans=0.1 2023-11-28 03:30:53,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3333113.3333333335, ans=0.125 2023-11-28 03:30:56,304 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 7000, loss[loss=0.06637, simple_loss=0.08882, pruned_loss=0.01392, audio_tagging_loss=0.008046, over 16550.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.0903, pruned_loss=0.01227, audio_tagging_loss=0.008504, over 3032045.91 frames. ], batch size: 63, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 03:31:01,686 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.744e+01 8.564e+01 9.211e+01 9.659e+01 1.272e+02, threshold=1.842e+02, percent-clipped=0.0 2023-11-28 03:31:14,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3333246.6666666665, ans=0.2 2023-11-28 03:31:15,830 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.53 vs. limit=22.5 2023-11-28 03:31:20,455 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 500000 2023-11-28 03:31:26,193 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3333313.3333333335, ans=0.2 2023-11-28 03:31:33,495 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3333380.0, ans=0.125 2023-11-28 03:31:39,021 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3333380.0, ans=0.1 2023-11-28 03:31:45,483 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3333446.6666666665, ans=0.05 2023-11-28 03:31:55,595 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 7050, loss[loss=0.05684, simple_loss=0.07045, pruned_loss=0.01227, audio_tagging_loss=0.009345, over 15152.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.09051, pruned_loss=0.01239, audio_tagging_loss=0.008615, over 3033760.82 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 03:32:19,962 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 500050 2023-11-28 03:32:22,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3333646.6666666665, ans=0.0 2023-11-28 03:32:41,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3333780.0, ans=0.025 2023-11-28 03:32:52,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3333846.6666666665, ans=0.125 2023-11-28 03:32:52,929 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 7100, loss[loss=0.05723, simple_loss=0.08084, pruned_loss=0.006236, audio_tagging_loss=0.01058, over 14900.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.08952, pruned_loss=0.01231, audio_tagging_loss=0.008689, over 3035231.79 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 03:32:54,649 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3333846.6666666665, ans=0.0 2023-11-28 03:32:58,788 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.626e+01 8.733e+01 9.408e+01 1.010e+02 1.480e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-28 03:33:04,555 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3333913.3333333335, ans=0.1 2023-11-28 03:33:12,786 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.30 vs. limit=15.0 2023-11-28 03:33:15,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3333980.0, ans=0.0 2023-11-28 03:33:16,517 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 500100 2023-11-28 03:33:22,925 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.29 vs. limit=15.0 2023-11-28 03:33:28,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3334046.6666666665, ans=0.0 2023-11-28 03:33:49,694 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 7150, loss[loss=0.06004, simple_loss=0.07715, pruned_loss=0.01258, audio_tagging_loss=0.00889, over 14148.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08912, pruned_loss=0.01212, audio_tagging_loss=0.008769, over 3037205.54 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 03:33:50,465 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.48 vs. limit=10.0 2023-11-28 03:34:03,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3334246.6666666665, ans=0.0 2023-11-28 03:34:13,257 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 500150 2023-11-28 03:34:13,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3334313.3333333335, ans=0.1 2023-11-28 03:34:32,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3334380.0, ans=0.125 2023-11-28 03:34:40,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3334446.6666666665, ans=0.2 2023-11-28 03:34:46,523 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 7200, loss[loss=0.06086, simple_loss=0.08416, pruned_loss=0.01104, audio_tagging_loss=0.007731, over 16032.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.08866, pruned_loss=0.01195, audio_tagging_loss=0.008905, over 3033405.94 frames. ], batch size: 60, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:34:48,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3334513.3333333335, ans=0.125 2023-11-28 03:34:48,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3334513.3333333335, ans=0.0 2023-11-28 03:34:51,917 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.529e+01 8.895e+01 9.379e+01 1.001e+02 1.500e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-28 03:34:52,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3334513.3333333335, ans=0.2 2023-11-28 03:35:10,599 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 500200 2023-11-28 03:35:17,710 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3334646.6666666665, ans=0.0 2023-11-28 03:35:28,215 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3334713.3333333335, ans=0.04949747468305833 2023-11-28 03:35:29,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3334713.3333333335, ans=0.1 2023-11-28 03:35:33,441 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3334780.0, ans=0.1 2023-11-28 03:35:35,057 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.89 vs. limit=15.0 2023-11-28 03:35:35,153 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.44 vs. limit=22.5 2023-11-28 03:35:41,235 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3334780.0, ans=0.125 2023-11-28 03:35:42,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3334846.6666666665, ans=0.125 2023-11-28 03:35:43,149 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 7250, loss[loss=0.07797, simple_loss=0.1017, pruned_loss=0.01699, audio_tagging_loss=0.01014, over 15883.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08903, pruned_loss=0.01199, audio_tagging_loss=0.008989, over 3038700.86 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:36:07,252 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 500250 2023-11-28 03:36:09,843 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.18 vs. limit=15.0 2023-11-28 03:36:24,829 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.99 vs. limit=15.0 2023-11-28 03:36:36,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3335113.3333333335, ans=0.2 2023-11-28 03:36:40,982 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 7300, loss[loss=0.07422, simple_loss=0.1052, pruned_loss=0.01369, audio_tagging_loss=0.007923, over 16612.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.08995, pruned_loss=0.01218, audio_tagging_loss=0.008747, over 3037521.18 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:36:45,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3335180.0, ans=0.2 2023-11-28 03:36:46,361 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.375e+01 8.677e+01 9.313e+01 1.019e+02 2.186e+02, threshold=1.863e+02, percent-clipped=1.0 2023-11-28 03:36:49,301 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.36 vs. limit=15.0 2023-11-28 03:36:55,429 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.13 vs. limit=22.5 2023-11-28 03:36:56,044 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 03:37:04,806 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 500300 2023-11-28 03:37:04,937 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=3335313.3333333335, ans=0.1 2023-11-28 03:37:15,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3335380.0, ans=0.1 2023-11-28 03:37:24,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3335380.0, ans=0.025 2023-11-28 03:37:25,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3335380.0, ans=0.125 2023-11-28 03:37:28,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3335446.6666666665, ans=0.125 2023-11-28 03:37:30,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3335446.6666666665, ans=0.125 2023-11-28 03:37:38,147 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 7350, loss[loss=0.07968, simple_loss=0.1152, pruned_loss=0.01812, audio_tagging_loss=0.003972, over 15748.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.08996, pruned_loss=0.01209, audio_tagging_loss=0.008679, over 3043684.37 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 03:37:41,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3335513.3333333335, ans=0.5 2023-11-28 03:37:55,782 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.33 vs. limit=15.0 2023-11-28 03:38:02,903 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 500350 2023-11-28 03:38:17,893 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3335713.3333333335, ans=0.1 2023-11-28 03:38:22,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3335713.3333333335, ans=0.125 2023-11-28 03:38:25,204 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3335780.0, ans=0.0 2023-11-28 03:38:30,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3335780.0, ans=0.125 2023-11-28 03:38:30,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3335780.0, ans=0.0 2023-11-28 03:38:35,414 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.39 vs. limit=15.0 2023-11-28 03:38:35,813 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 7400, loss[loss=0.0708, simple_loss=0.09928, pruned_loss=0.0127, audio_tagging_loss=0.008458, over 16379.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.09016, pruned_loss=0.01214, audio_tagging_loss=0.008647, over 3050427.15 frames. ], batch size: 60, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 03:38:43,245 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.183e+01 8.811e+01 9.404e+01 1.022e+02 2.241e+02, threshold=1.881e+02, percent-clipped=1.0 2023-11-28 03:38:55,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=3335913.3333333335, ans=15.0 2023-11-28 03:38:55,515 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.86 vs. limit=15.0 2023-11-28 03:38:57,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3335913.3333333335, ans=0.1 2023-11-28 03:38:59,917 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.46 vs. limit=10.0 2023-11-28 03:39:00,580 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 500400 2023-11-28 03:39:32,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3336113.3333333335, ans=0.125 2023-11-28 03:39:34,658 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 7450, loss[loss=0.04779, simple_loss=0.0598, pruned_loss=0.008235, audio_tagging_loss=0.009659, over 14893.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08974, pruned_loss=0.01201, audio_tagging_loss=0.008647, over 3049510.30 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 03:39:38,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3336180.0, ans=0.0 2023-11-28 03:39:53,369 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.90 vs. limit=15.0 2023-11-28 03:39:58,205 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 500450 2023-11-28 03:40:29,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3336446.6666666665, ans=0.1 2023-11-28 03:40:31,118 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 7500, loss[loss=0.07146, simple_loss=0.09046, pruned_loss=0.01589, audio_tagging_loss=0.01034, over 16224.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.08949, pruned_loss=0.01211, audio_tagging_loss=0.008701, over 3049654.24 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 03:40:38,136 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.849e+01 9.074e+01 9.605e+01 1.016e+02 1.899e+02, threshold=1.921e+02, percent-clipped=1.0 2023-11-28 03:40:47,834 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.01 vs. limit=15.0 2023-11-28 03:40:55,801 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 500500 2023-11-28 03:41:23,563 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.29 vs. limit=22.5 2023-11-28 03:41:28,466 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 7550, loss[loss=0.06878, simple_loss=0.09048, pruned_loss=0.01487, audio_tagging_loss=0.008668, over 15276.00 frames. ], tot_loss[loss=0.06538, simple_loss=0.08918, pruned_loss=0.01216, audio_tagging_loss=0.008631, over 3045162.58 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 03:41:32,509 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.60 vs. limit=15.0 2023-11-28 03:41:35,278 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=3336846.6666666665, ans=0.5 2023-11-28 03:41:46,295 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.50 vs. limit=15.0 2023-11-28 03:41:52,878 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 500550 2023-11-28 03:41:57,453 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3336980.0, ans=0.0 2023-11-28 03:42:06,099 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.85 vs. limit=12.0 2023-11-28 03:42:07,268 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.27 vs. limit=15.0 2023-11-28 03:42:14,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3337113.3333333335, ans=0.95 2023-11-28 03:42:19,321 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.72 vs. limit=15.0 2023-11-28 03:42:22,441 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.14 vs. limit=15.0 2023-11-28 03:42:26,172 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 7600, loss[loss=0.09679, simple_loss=0.1301, pruned_loss=0.02406, audio_tagging_loss=0.00769, over 15792.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.08936, pruned_loss=0.01247, audio_tagging_loss=0.008635, over 3035263.28 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:42:28,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3337180.0, ans=0.125 2023-11-28 03:42:32,815 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.755e+01 8.828e+01 9.447e+01 1.020e+02 1.254e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-28 03:42:36,547 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.95 vs. limit=22.5 2023-11-28 03:42:50,609 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 500600 2023-11-28 03:42:51,333 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.41 vs. limit=15.0 2023-11-28 03:42:57,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3337313.3333333335, ans=0.1 2023-11-28 03:43:14,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3337446.6666666665, ans=0.1 2023-11-28 03:43:17,276 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.37 vs. limit=6.0 2023-11-28 03:43:23,928 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 7650, loss[loss=0.06791, simple_loss=0.09442, pruned_loss=0.01373, audio_tagging_loss=0.006978, over 14436.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.08966, pruned_loss=0.01245, audio_tagging_loss=0.008532, over 3039358.48 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:43:31,359 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3337513.3333333335, ans=10.0 2023-11-28 03:43:40,018 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3337580.0, ans=0.2 2023-11-28 03:43:48,278 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 500650 2023-11-28 03:43:48,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3337646.6666666665, ans=0.2 2023-11-28 03:44:17,459 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.10 vs. limit=15.0 2023-11-28 03:44:21,212 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 7700, loss[loss=0.07443, simple_loss=0.1023, pruned_loss=0.01649, audio_tagging_loss=0.006776, over 15285.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.08953, pruned_loss=0.0124, audio_tagging_loss=0.008646, over 3046312.58 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:44:23,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3337846.6666666665, ans=0.1 2023-11-28 03:44:27,647 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.039e+01 8.661e+01 9.049e+01 9.903e+01 1.330e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-28 03:44:34,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3337913.3333333335, ans=0.1 2023-11-28 03:44:44,884 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 500700 2023-11-28 03:45:18,641 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 7750, loss[loss=0.07089, simple_loss=0.09839, pruned_loss=0.01147, audio_tagging_loss=0.01023, over 14655.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08884, pruned_loss=0.01233, audio_tagging_loss=0.008677, over 3044138.67 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:45:30,922 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3338246.6666666665, ans=0.035 2023-11-28 03:45:43,003 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 500750 2023-11-28 03:45:54,497 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.76 vs. limit=22.5 2023-11-28 03:46:15,553 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 7800, loss[loss=0.06212, simple_loss=0.07845, pruned_loss=0.01211, audio_tagging_loss=0.01079, over 14443.00 frames. ], tot_loss[loss=0.066, simple_loss=0.08968, pruned_loss=0.0125, audio_tagging_loss=0.008661, over 3041150.98 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:46:22,527 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.137e+01 8.833e+01 9.588e+01 1.059e+02 1.292e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-28 03:46:36,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3338580.0, ans=0.1 2023-11-28 03:46:37,857 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 03:46:39,838 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 500800 2023-11-28 03:46:52,353 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3338713.3333333335, ans=0.125 2023-11-28 03:47:13,907 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 7850, loss[loss=0.04718, simple_loss=0.05897, pruned_loss=0.005971, audio_tagging_loss=0.01172, over 14971.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.08954, pruned_loss=0.01258, audio_tagging_loss=0.008794, over 3039178.21 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:47:22,948 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3338846.6666666665, ans=0.1 2023-11-28 03:47:37,939 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 500850 2023-11-28 03:48:10,414 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 7900, loss[loss=0.08032, simple_loss=0.1166, pruned_loss=0.01568, audio_tagging_loss=0.006346, over 15434.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.09005, pruned_loss=0.01253, audio_tagging_loss=0.008757, over 3046886.93 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:48:17,455 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.341e+01 8.758e+01 9.324e+01 1.005e+02 1.322e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-28 03:48:28,583 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3339246.6666666665, ans=0.0 2023-11-28 03:48:33,972 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 500900 2023-11-28 03:48:36,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3339313.3333333335, ans=0.2 2023-11-28 03:48:43,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3339380.0, ans=0.1 2023-11-28 03:48:45,768 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3339380.0, ans=0.0 2023-11-28 03:49:06,820 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 7950, loss[loss=0.08916, simple_loss=0.1334, pruned_loss=0.01459, audio_tagging_loss=0.007873, over 15594.00 frames. ], tot_loss[loss=0.06677, simple_loss=0.09068, pruned_loss=0.01259, audio_tagging_loss=0.008836, over 3053179.46 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:49:24,554 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 03:49:25,039 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.52 vs. limit=22.5 2023-11-28 03:49:31,092 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 500950 2023-11-28 03:49:32,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3339646.6666666665, ans=0.0 2023-11-28 03:49:43,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3339713.3333333335, ans=0.0 2023-11-28 03:50:04,241 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 8000, loss[loss=0.0543, simple_loss=0.07181, pruned_loss=0.008522, audio_tagging_loss=0.009873, over 14560.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.08929, pruned_loss=0.01228, audio_tagging_loss=0.009018, over 3050530.77 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 03:50:07,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3339846.6666666665, ans=0.125 2023-11-28 03:50:11,484 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.492e+01 8.539e+01 9.143e+01 9.818e+01 1.375e+02, threshold=1.829e+02, percent-clipped=0.0 2023-11-28 03:50:18,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3339913.3333333335, ans=0.1 2023-11-28 03:50:28,959 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 501000 2023-11-28 03:50:32,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3339980.0, ans=0.125 2023-11-28 03:50:51,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3340113.3333333335, ans=0.2 2023-11-28 03:50:52,609 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3340113.3333333335, ans=0.0 2023-11-28 03:50:59,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3340113.3333333335, ans=0.0 2023-11-28 03:51:02,063 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 8050, loss[loss=0.06025, simple_loss=0.07418, pruned_loss=0.0111, audio_tagging_loss=0.01207, over 15168.00 frames. ], tot_loss[loss=0.0663, simple_loss=0.08997, pruned_loss=0.01231, audio_tagging_loss=0.009011, over 3046324.42 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:51:08,163 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.78 vs. limit=10.0 2023-11-28 03:51:11,215 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.33 vs. limit=15.0 2023-11-28 03:51:26,197 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 501050 2023-11-28 03:51:28,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3340313.3333333335, ans=0.2 2023-11-28 03:51:31,324 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3340313.3333333335, ans=0.125 2023-11-28 03:51:37,453 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.05 vs. limit=15.0 2023-11-28 03:51:39,509 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3340380.0, ans=0.0 2023-11-28 03:51:54,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3340446.6666666665, ans=0.2 2023-11-28 03:52:00,075 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 8100, loss[loss=0.07244, simple_loss=0.09264, pruned_loss=0.01817, audio_tagging_loss=0.007952, over 15471.00 frames. ], tot_loss[loss=0.06664, simple_loss=0.09071, pruned_loss=0.01242, audio_tagging_loss=0.008865, over 3042745.92 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:52:07,641 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.157e+01 8.647e+01 9.377e+01 1.005e+02 1.143e+02, threshold=1.875e+02, percent-clipped=0.0 2023-11-28 03:52:24,106 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 501100 2023-11-28 03:52:37,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3340713.3333333335, ans=0.1 2023-11-28 03:52:48,462 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.61 vs. limit=22.5 2023-11-28 03:52:54,722 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.31 vs. limit=22.5 2023-11-28 03:52:56,856 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 8150, loss[loss=0.07144, simple_loss=0.09473, pruned_loss=0.01653, audio_tagging_loss=0.00755, over 14778.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.09124, pruned_loss=0.01252, audio_tagging_loss=0.008723, over 3043269.87 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:53:18,488 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.63 vs. limit=15.0 2023-11-28 03:53:19,755 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.35 vs. limit=15.0 2023-11-28 03:53:21,320 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 501150 2023-11-28 03:53:21,379 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3340980.0, ans=0.125 2023-11-28 03:53:24,682 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 03:53:24,745 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3340980.0, ans=0.125 2023-11-28 03:53:31,218 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3341046.6666666665, ans=0.025 2023-11-28 03:53:42,701 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3341113.3333333335, ans=0.125 2023-11-28 03:53:45,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3341113.3333333335, ans=0.1 2023-11-28 03:53:53,993 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 8200, loss[loss=0.06342, simple_loss=0.09352, pruned_loss=0.009873, audio_tagging_loss=0.00679, over 15597.00 frames. ], tot_loss[loss=0.06664, simple_loss=0.09118, pruned_loss=0.01241, audio_tagging_loss=0.008641, over 3045685.58 frames. ], batch size: 60, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:53:57,343 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 03:54:02,498 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.527e+01 8.802e+01 9.578e+01 1.025e+02 1.373e+02, threshold=1.916e+02, percent-clipped=0.0 2023-11-28 03:54:06,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3341246.6666666665, ans=0.05 2023-11-28 03:54:13,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3341246.6666666665, ans=0.0 2023-11-28 03:54:17,781 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 501200 2023-11-28 03:54:17,784 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3341313.3333333335, ans=0.125 2023-11-28 03:54:25,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3341313.3333333335, ans=0.125 2023-11-28 03:54:25,802 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.63 vs. limit=10.0 2023-11-28 03:54:33,439 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.06 vs. limit=15.0 2023-11-28 03:54:51,791 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 8250, loss[loss=0.06186, simple_loss=0.08375, pruned_loss=0.01117, audio_tagging_loss=0.008811, over 14926.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.09015, pruned_loss=0.01225, audio_tagging_loss=0.00854, over 3044026.16 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:55:00,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3341513.3333333335, ans=0.0 2023-11-28 03:55:15,142 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 501250 2023-11-28 03:55:33,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3341713.3333333335, ans=0.2 2023-11-28 03:55:42,290 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3341780.0, ans=0.0 2023-11-28 03:55:42,375 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 03:55:46,654 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3341780.0, ans=0.125 2023-11-28 03:55:48,584 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 8300, loss[loss=0.0666, simple_loss=0.08545, pruned_loss=0.01318, audio_tagging_loss=0.0107, over 14008.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.09018, pruned_loss=0.01219, audio_tagging_loss=0.008478, over 3046733.49 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:55:56,897 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.433e+01 8.790e+01 9.364e+01 1.000e+02 1.308e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-28 03:56:08,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3341913.3333333335, ans=0.1 2023-11-28 03:56:08,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3341913.3333333335, ans=0.125 2023-11-28 03:56:13,781 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 501300 2023-11-28 03:56:19,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3341980.0, ans=0.0 2023-11-28 03:56:27,277 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.19 vs. limit=15.0 2023-11-28 03:56:45,984 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 8350, loss[loss=0.04696, simple_loss=0.0625, pruned_loss=0.005902, audio_tagging_loss=0.009803, over 15236.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.09009, pruned_loss=0.01228, audio_tagging_loss=0.008438, over 3048196.71 frames. ], batch size: 60, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:57:08,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3342313.3333333335, ans=0.0 2023-11-28 03:57:09,072 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.51 vs. limit=12.0 2023-11-28 03:57:10,732 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 501350 2023-11-28 03:57:26,038 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.06 vs. limit=15.0 2023-11-28 03:57:27,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3342380.0, ans=0.125 2023-11-28 03:57:32,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3342446.6666666665, ans=0.0 2023-11-28 03:57:35,488 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3342446.6666666665, ans=0.95 2023-11-28 03:57:43,977 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 8400, loss[loss=0.07365, simple_loss=0.1042, pruned_loss=0.01213, audio_tagging_loss=0.00943, over 14633.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.09041, pruned_loss=0.01236, audio_tagging_loss=0.008361, over 3042994.53 frames. ], batch size: 53, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 03:57:45,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3342513.3333333335, ans=0.125 2023-11-28 03:57:51,647 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.717e+01 8.873e+01 9.503e+01 1.023e+02 1.226e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-28 03:57:57,786 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.94 vs. limit=10.0 2023-11-28 03:58:07,685 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 501400 2023-11-28 03:58:41,312 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 8450, loss[loss=0.05049, simple_loss=0.06077, pruned_loss=0.009809, audio_tagging_loss=0.01029, over 14456.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08954, pruned_loss=0.0122, audio_tagging_loss=0.00848, over 3047325.76 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 03:58:46,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3342846.6666666665, ans=0.1 2023-11-28 03:58:46,860 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.76 vs. limit=15.0 2023-11-28 03:58:51,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3342913.3333333335, ans=0.07 2023-11-28 03:58:57,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff2.min_abs, batch_count=3342913.3333333335, ans=0.1 2023-11-28 03:59:00,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3342913.3333333335, ans=0.0 2023-11-28 03:59:02,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3342980.0, ans=0.0 2023-11-28 03:59:05,871 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 501450 2023-11-28 03:59:17,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3343046.6666666665, ans=0.0 2023-11-28 03:59:22,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3343046.6666666665, ans=0.2 2023-11-28 03:59:39,103 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 8500, loss[loss=0.08493, simple_loss=0.112, pruned_loss=0.0187, audio_tagging_loss=0.01025, over 15556.00 frames. ], tot_loss[loss=0.066, simple_loss=0.09017, pruned_loss=0.01236, audio_tagging_loss=0.008555, over 3054514.88 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 03:59:46,770 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.475e+01 8.888e+01 9.285e+01 1.024e+02 1.288e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-28 04:00:03,307 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 501500 2023-11-28 04:00:16,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3343380.0, ans=0.0 2023-11-28 04:00:36,609 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 8550, loss[loss=0.07469, simple_loss=0.1043, pruned_loss=0.01474, audio_tagging_loss=0.00781, over 14948.00 frames. ], tot_loss[loss=0.06671, simple_loss=0.09121, pruned_loss=0.01252, audio_tagging_loss=0.008581, over 3054487.87 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 04:00:49,998 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.00 vs. limit=12.0 2023-11-28 04:01:00,886 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 501550 2023-11-28 04:01:02,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3343646.6666666665, ans=0.2 2023-11-28 04:01:05,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3343646.6666666665, ans=0.2 2023-11-28 04:01:08,649 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3343646.6666666665, ans=0.125 2023-11-28 04:01:09,015 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.20 vs. limit=22.5 2023-11-28 04:01:20,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3343713.3333333335, ans=0.0 2023-11-28 04:01:20,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3343713.3333333335, ans=0.07 2023-11-28 04:01:30,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_na.min_abs, batch_count=3343780.0, ans=0.02 2023-11-28 04:01:33,869 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 8600, loss[loss=0.05694, simple_loss=0.08116, pruned_loss=0.006647, audio_tagging_loss=0.00971, over 15967.00 frames. ], tot_loss[loss=0.06685, simple_loss=0.09141, pruned_loss=0.0125, audio_tagging_loss=0.008638, over 3054129.24 frames. ], batch size: 60, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 04:01:42,158 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.388e+01 8.741e+01 9.411e+01 9.975e+01 1.880e+02, threshold=1.882e+02, percent-clipped=1.0 2023-11-28 04:01:52,158 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3343913.3333333335, ans=0.09899494936611666 2023-11-28 04:01:57,411 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 501600 2023-11-28 04:01:58,122 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.18 vs. limit=15.0 2023-11-28 04:02:05,035 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3343980.0, ans=0.125 2023-11-28 04:02:10,351 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.43 vs. limit=15.0 2023-11-28 04:02:10,455 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.41 vs. limit=15.0 2023-11-28 04:02:12,204 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3344046.6666666665, ans=0.0 2023-11-28 04:02:18,194 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.90 vs. limit=15.0 2023-11-28 04:02:20,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3344113.3333333335, ans=0.0 2023-11-28 04:02:31,102 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 8650, loss[loss=0.04621, simple_loss=0.05649, pruned_loss=0.008348, audio_tagging_loss=0.009613, over 15458.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.09047, pruned_loss=0.01237, audio_tagging_loss=0.00871, over 3058386.30 frames. ], batch size: 59, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 04:02:55,663 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 501650 2023-11-28 04:03:09,585 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3344380.0, ans=0.2 2023-11-28 04:03:24,927 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 04:03:26,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3344446.6666666665, ans=0.0 2023-11-28 04:03:28,915 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 8700, loss[loss=0.06014, simple_loss=0.07965, pruned_loss=0.009162, audio_tagging_loss=0.01116, over 15463.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.09104, pruned_loss=0.01229, audio_tagging_loss=0.008738, over 3062156.19 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 04:03:32,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3344513.3333333335, ans=0.0 2023-11-28 04:03:34,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3344513.3333333335, ans=0.1 2023-11-28 04:03:37,600 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.489e+01 8.850e+01 9.398e+01 9.849e+01 1.274e+02, threshold=1.880e+02, percent-clipped=0.0 2023-11-28 04:03:53,031 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 501700 2023-11-28 04:03:53,114 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3344646.6666666665, ans=0.125 2023-11-28 04:04:13,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3344780.0, ans=0.125 2023-11-28 04:04:26,116 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 8750, loss[loss=0.05923, simple_loss=0.08144, pruned_loss=0.009407, audio_tagging_loss=0.0091, over 15224.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.09092, pruned_loss=0.01236, audio_tagging_loss=0.008699, over 3053381.96 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:04:29,500 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3344846.6666666665, ans=0.0 2023-11-28 04:04:30,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3344846.6666666665, ans=0.2 2023-11-28 04:04:48,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3344980.0, ans=0.2 2023-11-28 04:04:49,569 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 501750 2023-11-28 04:05:03,881 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.61 vs. limit=15.0 2023-11-28 04:05:22,945 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 8800, loss[loss=0.06738, simple_loss=0.0852, pruned_loss=0.01088, audio_tagging_loss=0.01391, over 15337.00 frames. ], tot_loss[loss=0.06676, simple_loss=0.09105, pruned_loss=0.01233, audio_tagging_loss=0.008902, over 3059249.46 frames. ], batch size: 59, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 04:05:31,626 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.679e+01 8.835e+01 9.360e+01 1.012e+02 1.261e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-28 04:05:46,845 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 501800 2023-11-28 04:06:05,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3345380.0, ans=0.0 2023-11-28 04:06:08,987 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 04:06:19,645 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 8850, loss[loss=0.04824, simple_loss=0.065, pruned_loss=0.007924, audio_tagging_loss=0.007813, over 15644.00 frames. ], tot_loss[loss=0.06679, simple_loss=0.09104, pruned_loss=0.01238, audio_tagging_loss=0.008893, over 3059254.27 frames. ], batch size: 59, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 04:06:32,187 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.48 vs. limit=15.0 2023-11-28 04:06:34,729 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 04:06:38,286 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3345580.0, ans=0.2 2023-11-28 04:06:44,150 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 501850 2023-11-28 04:07:16,686 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 8900, loss[loss=0.07637, simple_loss=0.1001, pruned_loss=0.01786, audio_tagging_loss=0.008444, over 14907.00 frames. ], tot_loss[loss=0.06741, simple_loss=0.09202, pruned_loss=0.01268, audio_tagging_loss=0.008714, over 3055484.36 frames. ], batch size: 53, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 04:07:21,436 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3345846.6666666665, ans=0.1 2023-11-28 04:07:22,495 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3345846.6666666665, ans=0.0 2023-11-28 04:07:25,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3345846.6666666665, ans=0.0 2023-11-28 04:07:25,983 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.615e+01 8.854e+01 9.513e+01 9.955e+01 1.488e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-28 04:07:33,359 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3345913.3333333335, ans=0.125 2023-11-28 04:07:40,818 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 501900 2023-11-28 04:07:41,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3345980.0, ans=0.125 2023-11-28 04:08:02,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=3346113.3333333335, ans=0.95 2023-11-28 04:08:08,109 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.61 vs. limit=15.0 2023-11-28 04:08:14,223 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 8950, loss[loss=0.06604, simple_loss=0.08618, pruned_loss=0.01294, audio_tagging_loss=0.01001, over 15301.00 frames. ], tot_loss[loss=0.06755, simple_loss=0.09241, pruned_loss=0.01275, audio_tagging_loss=0.008599, over 3058167.51 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 04:08:17,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3346180.0, ans=0.125 2023-11-28 04:08:29,139 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.45 vs. limit=6.0 2023-11-28 04:08:34,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3346246.6666666665, ans=0.125 2023-11-28 04:08:37,875 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 501950 2023-11-28 04:08:38,455 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.74 vs. limit=15.0 2023-11-28 04:08:41,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3346313.3333333335, ans=0.2 2023-11-28 04:08:43,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3346313.3333333335, ans=0.1 2023-11-28 04:08:48,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3346380.0, ans=0.0 2023-11-28 04:08:50,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3346380.0, ans=0.125 2023-11-28 04:08:53,437 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3346380.0, ans=0.2 2023-11-28 04:09:10,177 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 9000, loss[loss=0.06111, simple_loss=0.08641, pruned_loss=0.009533, audio_tagging_loss=0.008374, over 16200.00 frames. ], tot_loss[loss=0.06753, simple_loss=0.09241, pruned_loss=0.01277, audio_tagging_loss=0.008553, over 3052249.73 frames. ], batch size: 61, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 04:09:10,177 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-28 04:09:44,947 INFO [train_asr.py:1267] (1/4) Epoch 42, validation: loss=0.05915, simple_loss=0.05063, pruned_loss=0.005264, audio_tagging_loss=0.02857, over 4681554.00 frames. 2023-11-28 04:09:44,948 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-28 04:09:46,543 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.56 vs. limit=12.0 2023-11-28 04:09:50,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3346513.3333333335, ans=0.0 2023-11-28 04:09:54,292 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.666e+01 8.664e+01 9.503e+01 1.037e+02 1.475e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-28 04:09:56,254 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3346580.0, ans=0.125 2023-11-28 04:10:03,150 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.30 vs. limit=15.0 2023-11-28 04:10:09,076 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 502000 2023-11-28 04:10:22,114 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3346713.3333333335, ans=0.5 2023-11-28 04:10:28,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3346713.3333333335, ans=0.1 2023-11-28 04:10:29,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3346780.0, ans=0.0 2023-11-28 04:10:29,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3346780.0, ans=0.0 2023-11-28 04:10:43,077 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 9050, loss[loss=0.06817, simple_loss=0.08794, pruned_loss=0.01555, audio_tagging_loss=0.00865, over 14657.00 frames. ], tot_loss[loss=0.06737, simple_loss=0.09205, pruned_loss=0.01281, audio_tagging_loss=0.008533, over 3050528.62 frames. ], batch size: 54, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 04:10:44,401 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3346846.6666666665, ans=0.125 2023-11-28 04:10:54,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3346913.3333333335, ans=0.125 2023-11-28 04:11:06,641 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 502050 2023-11-28 04:11:07,832 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3346980.0, ans=0.0 2023-11-28 04:11:40,121 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 9100, loss[loss=0.06993, simple_loss=0.0954, pruned_loss=0.01278, audio_tagging_loss=0.009454, over 15163.00 frames. ], tot_loss[loss=0.06694, simple_loss=0.09152, pruned_loss=0.01265, audio_tagging_loss=0.008528, over 3055345.43 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 04:11:45,895 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3347180.0, ans=0.2 2023-11-28 04:11:48,891 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.155e+01 8.691e+01 9.383e+01 1.014e+02 1.282e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-28 04:12:03,017 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 502100 2023-11-28 04:12:03,893 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3347313.3333333335, ans=0.1 2023-11-28 04:12:04,787 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3347313.3333333335, ans=0.1 2023-11-28 04:12:07,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3347313.3333333335, ans=0.125 2023-11-28 04:12:22,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3347380.0, ans=0.2 2023-11-28 04:12:26,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3347446.6666666665, ans=0.125 2023-11-28 04:12:36,769 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 9150, loss[loss=0.06723, simple_loss=0.09433, pruned_loss=0.01263, audio_tagging_loss=0.007437, over 14413.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.09112, pruned_loss=0.0126, audio_tagging_loss=0.008508, over 3046583.26 frames. ], batch size: 54, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 04:13:01,309 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 502150 2023-11-28 04:13:10,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3347713.3333333335, ans=0.0 2023-11-28 04:13:12,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3347713.3333333335, ans=0.125 2023-11-28 04:13:23,853 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3347780.0, ans=0.0 2023-11-28 04:13:34,148 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 9200, loss[loss=0.06972, simple_loss=0.09368, pruned_loss=0.01406, audio_tagging_loss=0.008815, over 14874.00 frames. ], tot_loss[loss=0.06657, simple_loss=0.09101, pruned_loss=0.01257, audio_tagging_loss=0.008494, over 3051662.02 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 04:13:35,441 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3347846.6666666665, ans=0.2 2023-11-28 04:13:44,688 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.904e+01 8.837e+01 9.520e+01 1.030e+02 1.268e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-28 04:13:58,692 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 502200 2023-11-28 04:14:15,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3348046.6666666665, ans=0.125 2023-11-28 04:14:16,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3348046.6666666665, ans=0.125 2023-11-28 04:14:28,415 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.98 vs. limit=12.0 2023-11-28 04:14:32,133 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 9250, loss[loss=0.06073, simple_loss=0.08029, pruned_loss=0.011, audio_tagging_loss=0.009586, over 14664.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.09055, pruned_loss=0.01244, audio_tagging_loss=0.008537, over 3053844.25 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:14:55,941 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 502250 2023-11-28 04:15:13,831 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3348380.0, ans=0.1 2023-11-28 04:15:15,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3348380.0, ans=0.2 2023-11-28 04:15:16,144 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.59 vs. limit=22.5 2023-11-28 04:15:29,233 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 9300, loss[loss=0.0667, simple_loss=0.09236, pruned_loss=0.01146, audio_tagging_loss=0.00906, over 14058.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.09046, pruned_loss=0.01237, audio_tagging_loss=0.008593, over 3056058.70 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:15:32,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3348513.3333333335, ans=0.0 2023-11-28 04:15:37,169 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3348513.3333333335, ans=0.125 2023-11-28 04:15:40,868 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.853e+01 8.934e+01 9.500e+01 1.008e+02 1.455e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-28 04:15:42,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3348580.0, ans=0.125 2023-11-28 04:15:43,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3348580.0, ans=0.125 2023-11-28 04:15:46,971 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.27 vs. limit=12.0 2023-11-28 04:15:47,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3348580.0, ans=0.1 2023-11-28 04:15:48,924 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3348580.0, ans=0.125 2023-11-28 04:15:53,809 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 502300 2023-11-28 04:15:58,837 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.86 vs. limit=22.5 2023-11-28 04:16:26,570 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 9350, loss[loss=0.06457, simple_loss=0.09214, pruned_loss=0.0115, audio_tagging_loss=0.006994, over 15124.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.09069, pruned_loss=0.01236, audio_tagging_loss=0.008522, over 3055786.26 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:16:50,828 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 502350 2023-11-28 04:16:53,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3348980.0, ans=0.0 2023-11-28 04:17:09,276 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 04:17:15,663 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 04:17:23,704 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 9400, loss[loss=0.05371, simple_loss=0.0729, pruned_loss=0.009965, audio_tagging_loss=0.0073, over 15464.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.09044, pruned_loss=0.01233, audio_tagging_loss=0.008585, over 3058558.67 frames. ], batch size: 61, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:17:24,078 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3349180.0, ans=0.125 2023-11-28 04:17:28,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3349180.0, ans=0.0 2023-11-28 04:17:35,166 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.672e+01 8.937e+01 9.623e+01 1.033e+02 2.333e+02, threshold=1.925e+02, percent-clipped=1.0 2023-11-28 04:17:40,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3349246.6666666665, ans=0.125 2023-11-28 04:17:47,553 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 502400 2023-11-28 04:17:50,368 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3349313.3333333335, ans=0.1 2023-11-28 04:17:59,775 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3349380.0, ans=0.1 2023-11-28 04:18:21,493 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 9450, loss[loss=0.05773, simple_loss=0.07838, pruned_loss=0.008234, audio_tagging_loss=0.0103, over 15878.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.09088, pruned_loss=0.01234, audio_tagging_loss=0.008611, over 3056437.47 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:18:23,693 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 04:18:29,674 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.13 vs. limit=6.0 2023-11-28 04:18:31,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3349580.0, ans=0.0 2023-11-28 04:18:32,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3349580.0, ans=0.05 2023-11-28 04:18:45,204 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 502450 2023-11-28 04:18:46,969 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.80 vs. limit=15.0 2023-11-28 04:19:02,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3349713.3333333335, ans=0.125 2023-11-28 04:19:03,326 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3349713.3333333335, ans=0.125 2023-11-28 04:19:13,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3349780.0, ans=0.1 2023-11-28 04:19:18,891 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 9500, loss[loss=0.06806, simple_loss=0.08782, pruned_loss=0.0124, audio_tagging_loss=0.01175, over 14696.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.09054, pruned_loss=0.01232, audio_tagging_loss=0.00875, over 3055216.91 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:19:29,923 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.530e+01 8.748e+01 9.346e+01 1.036e+02 1.231e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-28 04:19:34,670 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3349913.3333333335, ans=0.0 2023-11-28 04:19:43,134 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 502500 2023-11-28 04:19:47,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3349980.0, ans=0.0 2023-11-28 04:19:48,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3349980.0, ans=0.125 2023-11-28 04:19:50,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3349980.0, ans=0.125 2023-11-28 04:19:54,088 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3350046.6666666665, ans=0.125 2023-11-28 04:19:54,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3350046.6666666665, ans=0.07 2023-11-28 04:19:56,296 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3350046.6666666665, ans=0.125 2023-11-28 04:19:57,787 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.87 vs. limit=15.0 2023-11-28 04:20:15,479 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 9550, loss[loss=0.075, simple_loss=0.1025, pruned_loss=0.01471, audio_tagging_loss=0.00904, over 15372.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.09015, pruned_loss=0.0123, audio_tagging_loss=0.008902, over 3059323.79 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:20:23,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3350180.0, ans=0.0 2023-11-28 04:20:39,869 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 502550 2023-11-28 04:20:40,007 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3350313.3333333335, ans=0.125 2023-11-28 04:20:49,636 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3350380.0, ans=0.025 2023-11-28 04:20:50,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3350380.0, ans=0.125 2023-11-28 04:20:53,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3350380.0, ans=0.04949747468305833 2023-11-28 04:21:06,349 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3350446.6666666665, ans=0.1 2023-11-28 04:21:13,658 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 9600, loss[loss=0.05032, simple_loss=0.06396, pruned_loss=0.009859, audio_tagging_loss=0.00848, over 14721.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.08972, pruned_loss=0.01239, audio_tagging_loss=0.009063, over 3061572.14 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 04:21:18,326 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3350513.3333333335, ans=0.07 2023-11-28 04:21:24,513 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.824e+01 8.754e+01 9.206e+01 1.000e+02 1.278e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-28 04:21:36,767 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.91 vs. limit=22.5 2023-11-28 04:21:37,317 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 502600 2023-11-28 04:21:45,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3350646.6666666665, ans=0.125 2023-11-28 04:21:50,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3350713.3333333335, ans=0.2 2023-11-28 04:21:59,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3350780.0, ans=0.09899494936611666 2023-11-28 04:22:01,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3350780.0, ans=0.1 2023-11-28 04:22:08,060 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.48 vs. limit=15.0 2023-11-28 04:22:10,873 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 9650, loss[loss=0.05769, simple_loss=0.0844, pruned_loss=0.009754, audio_tagging_loss=0.005735, over 13767.00 frames. ], tot_loss[loss=0.06657, simple_loss=0.09009, pruned_loss=0.01258, audio_tagging_loss=0.008947, over 3054002.12 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:22:35,689 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 502650 2023-11-28 04:22:57,912 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3351113.3333333335, ans=0.125 2023-11-28 04:22:59,952 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3351113.3333333335, ans=10.0 2023-11-28 04:23:08,617 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 9700, loss[loss=0.09745, simple_loss=0.1378, pruned_loss=0.02141, audio_tagging_loss=0.007131, over 15647.00 frames. ], tot_loss[loss=0.06727, simple_loss=0.09144, pruned_loss=0.01281, audio_tagging_loss=0.008738, over 3047646.64 frames. ], batch size: 53, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:23:20,014 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.83 vs. limit=10.0 2023-11-28 04:23:21,611 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.286e+01 8.660e+01 9.403e+01 1.036e+02 1.751e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-28 04:23:33,241 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 502700 2023-11-28 04:23:35,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3351313.3333333335, ans=0.125 2023-11-28 04:23:50,577 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.527e-03 2023-11-28 04:24:06,614 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 9750, loss[loss=0.06559, simple_loss=0.09705, pruned_loss=0.01052, audio_tagging_loss=0.006542, over 14783.00 frames. ], tot_loss[loss=0.06704, simple_loss=0.09136, pruned_loss=0.01272, audio_tagging_loss=0.008639, over 3048258.95 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:24:11,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3351513.3333333335, ans=0.2 2023-11-28 04:24:13,441 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.34 vs. limit=15.0 2023-11-28 04:24:13,927 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3351513.3333333335, ans=0.125 2023-11-28 04:24:30,806 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 502750 2023-11-28 04:24:37,569 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3351646.6666666665, ans=0.125 2023-11-28 04:25:01,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3351780.0, ans=0.125 2023-11-28 04:25:04,294 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 9800, loss[loss=0.04915, simple_loss=0.06879, pruned_loss=0.009148, audio_tagging_loss=0.005609, over 13254.00 frames. ], tot_loss[loss=0.06675, simple_loss=0.09117, pruned_loss=0.01263, audio_tagging_loss=0.008534, over 3040799.88 frames. ], batch size: 53, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:25:16,770 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.450e+01 8.861e+01 9.508e+01 1.028e+02 1.749e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-28 04:25:28,344 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 502800 2023-11-28 04:25:41,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3352046.6666666665, ans=0.125 2023-11-28 04:25:42,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3352046.6666666665, ans=0.0 2023-11-28 04:25:59,723 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 04:26:01,886 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 9850, loss[loss=0.06626, simple_loss=0.09325, pruned_loss=0.008145, audio_tagging_loss=0.01149, over 15653.00 frames. ], tot_loss[loss=0.0671, simple_loss=0.09148, pruned_loss=0.01286, audio_tagging_loss=0.008496, over 3043027.80 frames. ], batch size: 59, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:26:02,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3352180.0, ans=0.1 2023-11-28 04:26:17,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3352246.6666666665, ans=0.2 2023-11-28 04:26:26,286 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 502850 2023-11-28 04:26:26,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3352313.3333333335, ans=0.125 2023-11-28 04:26:31,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3352313.3333333335, ans=0.125 2023-11-28 04:26:34,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3352313.3333333335, ans=0.125 2023-11-28 04:26:36,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3352380.0, ans=0.0 2023-11-28 04:26:51,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3352446.6666666665, ans=0.1 2023-11-28 04:26:57,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3352446.6666666665, ans=0.0 2023-11-28 04:26:57,587 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.05 vs. limit=6.0 2023-11-28 04:26:59,727 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 9900, loss[loss=0.06907, simple_loss=0.09026, pruned_loss=0.01649, audio_tagging_loss=0.007455, over 15139.00 frames. ], tot_loss[loss=0.06729, simple_loss=0.09195, pruned_loss=0.01287, audio_tagging_loss=0.008447, over 3044554.65 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:27:03,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3352513.3333333335, ans=0.0 2023-11-28 04:27:06,554 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3352513.3333333335, ans=0.2 2023-11-28 04:27:12,292 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.961e+01 8.663e+01 9.354e+01 9.948e+01 1.345e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-28 04:27:19,949 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.21 vs. limit=12.0 2023-11-28 04:27:20,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_na.min_abs, batch_count=3352580.0, ans=0.02 2023-11-28 04:27:20,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3352580.0, ans=0.1 2023-11-28 04:27:23,806 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 502900 2023-11-28 04:27:34,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3352713.3333333335, ans=0.0 2023-11-28 04:27:41,381 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.08 vs. limit=22.5 2023-11-28 04:27:48,017 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3352780.0, ans=0.0 2023-11-28 04:27:57,191 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 9950, loss[loss=0.06971, simple_loss=0.09778, pruned_loss=0.01307, audio_tagging_loss=0.007747, over 16190.00 frames. ], tot_loss[loss=0.06688, simple_loss=0.09157, pruned_loss=0.01268, audio_tagging_loss=0.008415, over 3051764.78 frames. ], batch size: 63, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:28:05,959 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.30 vs. limit=15.0 2023-11-28 04:28:13,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3352913.3333333335, ans=0.125 2023-11-28 04:28:20,949 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 502950 2023-11-28 04:28:29,743 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3352980.0, ans=0.1 2023-11-28 04:28:34,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3353046.6666666665, ans=0.0 2023-11-28 04:28:50,784 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3353113.3333333335, ans=0.0 2023-11-28 04:28:54,845 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 10000, loss[loss=0.06301, simple_loss=0.08709, pruned_loss=0.01074, audio_tagging_loss=0.008725, over 14362.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.09124, pruned_loss=0.01248, audio_tagging_loss=0.008434, over 3047564.88 frames. ], batch size: 54, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:29:08,341 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.922e+01 8.771e+01 9.442e+01 1.017e+02 1.444e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-28 04:29:18,705 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 503000 2023-11-28 04:29:30,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3353380.0, ans=0.125 2023-11-28 04:29:52,396 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 10050, loss[loss=0.06605, simple_loss=0.09066, pruned_loss=0.0113, audio_tagging_loss=0.009416, over 14394.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.0906, pruned_loss=0.01244, audio_tagging_loss=0.008532, over 3040585.72 frames. ], batch size: 54, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:29:57,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3353513.3333333335, ans=0.0 2023-11-28 04:30:04,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3353580.0, ans=0.0 2023-11-28 04:30:05,668 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.04 vs. limit=10.0 2023-11-28 04:30:17,458 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 503050 2023-11-28 04:30:18,832 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3353646.6666666665, ans=0.04949747468305833 2023-11-28 04:30:27,815 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.45 vs. limit=15.0 2023-11-28 04:30:29,188 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.71 vs. limit=22.5 2023-11-28 04:30:30,931 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3353713.3333333335, ans=0.0 2023-11-28 04:30:50,291 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 10100, loss[loss=0.05445, simple_loss=0.07375, pruned_loss=0.00765, audio_tagging_loss=0.009924, over 14119.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.09081, pruned_loss=0.01259, audio_tagging_loss=0.008534, over 3044057.39 frames. ], batch size: 53, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:31:04,739 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.375e+01 8.581e+01 9.372e+01 1.014e+02 1.280e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-28 04:31:10,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=3353913.3333333335, ans=15.0 2023-11-28 04:31:14,657 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 503100 2023-11-28 04:31:17,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3353980.0, ans=0.125 2023-11-28 04:31:39,653 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 04:31:47,654 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3354180.0, ans=0.0 2023-11-28 04:31:48,556 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 10150, loss[loss=0.05584, simple_loss=0.0709, pruned_loss=0.009906, audio_tagging_loss=0.01048, over 14892.00 frames. ], tot_loss[loss=0.06682, simple_loss=0.09136, pruned_loss=0.01255, audio_tagging_loss=0.008592, over 3049698.03 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:31:56,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3354180.0, ans=0.0 2023-11-28 04:32:07,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3354246.6666666665, ans=0.0 2023-11-28 04:32:12,549 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 503150 2023-11-28 04:32:18,938 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 04:32:22,349 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3354380.0, ans=0.1 2023-11-28 04:32:45,392 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 10200, loss[loss=0.05294, simple_loss=0.06058, pruned_loss=0.00899, audio_tagging_loss=0.01366, over 14829.00 frames. ], tot_loss[loss=0.06674, simple_loss=0.09096, pruned_loss=0.01256, audio_tagging_loss=0.008707, over 3054712.94 frames. ], batch size: 60, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:32:59,165 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.169e+01 8.630e+01 9.209e+01 1.011e+02 1.470e+02, threshold=1.842e+02, percent-clipped=0.0 2023-11-28 04:33:09,108 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 503200 2023-11-28 04:33:10,774 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 04:33:16,670 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3354646.6666666665, ans=0.2 2023-11-28 04:33:24,110 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3354713.3333333335, ans=0.2 2023-11-28 04:33:28,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3354713.3333333335, ans=0.04949747468305833 2023-11-28 04:33:31,588 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.42 vs. limit=10.0 2023-11-28 04:33:41,817 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 10250, loss[loss=0.05754, simple_loss=0.08052, pruned_loss=0.00845, audio_tagging_loss=0.008826, over 15581.00 frames. ], tot_loss[loss=0.06589, simple_loss=0.08947, pruned_loss=0.01234, audio_tagging_loss=0.008814, over 3054635.71 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:33:47,767 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.00 vs. limit=22.5 2023-11-28 04:34:01,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3354913.3333333335, ans=0.125 2023-11-28 04:34:05,868 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 503250 2023-11-28 04:34:09,547 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.64 vs. limit=15.0 2023-11-28 04:34:15,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3355046.6666666665, ans=0.125 2023-11-28 04:34:23,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3355046.6666666665, ans=0.5 2023-11-28 04:34:38,554 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 10300, loss[loss=0.08994, simple_loss=0.1216, pruned_loss=0.02228, audio_tagging_loss=0.006874, over 15789.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.08971, pruned_loss=0.01244, audio_tagging_loss=0.008769, over 3054783.23 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:34:40,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3355180.0, ans=0.125 2023-11-28 04:34:51,860 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.324e+01 9.001e+01 9.538e+01 1.014e+02 1.211e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-28 04:35:02,849 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 503300 2023-11-28 04:35:16,080 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3355380.0, ans=0.125 2023-11-28 04:35:19,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3355380.0, ans=0.125 2023-11-28 04:35:27,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3355446.6666666665, ans=0.125 2023-11-28 04:35:35,758 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 10350, loss[loss=0.07187, simple_loss=0.09353, pruned_loss=0.01489, audio_tagging_loss=0.01021, over 15863.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.0896, pruned_loss=0.01243, audio_tagging_loss=0.008876, over 3057358.16 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:35:46,826 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.78 vs. limit=12.0 2023-11-28 04:35:59,225 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 503350 2023-11-28 04:36:00,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3355646.6666666665, ans=0.07 2023-11-28 04:36:31,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3355846.6666666665, ans=0.125 2023-11-28 04:36:32,725 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 10400, loss[loss=0.08578, simple_loss=0.1173, pruned_loss=0.0189, audio_tagging_loss=0.008231, over 15230.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.08941, pruned_loss=0.01246, audio_tagging_loss=0.008998, over 3060777.13 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 04:36:47,555 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.735e+01 8.770e+01 9.452e+01 1.025e+02 1.480e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-28 04:36:50,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_na.min_abs, batch_count=3355913.3333333335, ans=0.02 2023-11-28 04:36:51,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3355913.3333333335, ans=0.0 2023-11-28 04:36:56,932 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 503400 2023-11-28 04:36:59,874 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.21 vs. limit=10.0 2023-11-28 04:37:04,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3355980.0, ans=0.0 2023-11-28 04:37:30,483 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 10450, loss[loss=0.06537, simple_loss=0.09432, pruned_loss=0.008416, audio_tagging_loss=0.009796, over 15047.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.08935, pruned_loss=0.01239, audio_tagging_loss=0.008972, over 3051432.43 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:37:31,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3356180.0, ans=0.0 2023-11-28 04:37:51,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3356246.6666666665, ans=0.2 2023-11-28 04:37:55,373 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 503450 2023-11-28 04:37:59,320 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.71 vs. limit=6.0 2023-11-28 04:38:18,167 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.98 vs. limit=6.0 2023-11-28 04:38:27,722 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.55 vs. limit=15.0 2023-11-28 04:38:28,239 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 10500, loss[loss=0.077, simple_loss=0.1055, pruned_loss=0.01739, audio_tagging_loss=0.006842, over 15796.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.08966, pruned_loss=0.01237, audio_tagging_loss=0.008859, over 3056444.27 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:38:40,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3356580.0, ans=0.125 2023-11-28 04:38:43,230 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.125e+01 8.671e+01 9.492e+01 1.004e+02 1.311e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-28 04:38:47,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3356580.0, ans=0.2 2023-11-28 04:38:52,137 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 503500 2023-11-28 04:38:52,344 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3356646.6666666665, ans=0.0 2023-11-28 04:38:59,494 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3356646.6666666665, ans=0.1 2023-11-28 04:39:25,932 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 10550, loss[loss=0.05994, simple_loss=0.08506, pruned_loss=0.008823, audio_tagging_loss=0.008583, over 14476.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.08983, pruned_loss=0.01241, audio_tagging_loss=0.00877, over 3050225.35 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:39:39,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3356913.3333333335, ans=0.125 2023-11-28 04:39:49,602 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 503550 2023-11-28 04:39:49,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3356980.0, ans=0.0 2023-11-28 04:40:00,922 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.37 vs. limit=12.0 2023-11-28 04:40:02,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3357046.6666666665, ans=0.0 2023-11-28 04:40:13,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3357113.3333333335, ans=0.1 2023-11-28 04:40:17,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3357113.3333333335, ans=0.0 2023-11-28 04:40:18,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=3357113.3333333335, ans=22.5 2023-11-28 04:40:19,726 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3357113.3333333335, ans=0.1 2023-11-28 04:40:22,840 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 10600, loss[loss=0.06086, simple_loss=0.09441, pruned_loss=0.00802, audio_tagging_loss=0.005635, over 15691.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.08989, pruned_loss=0.01232, audio_tagging_loss=0.008724, over 3051128.98 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:40:25,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3357180.0, ans=0.0 2023-11-28 04:40:30,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3357180.0, ans=0.1 2023-11-28 04:40:37,795 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.073e+01 8.827e+01 9.555e+01 1.028e+02 1.264e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-28 04:40:40,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3357246.6666666665, ans=0.0 2023-11-28 04:40:48,240 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 503600 2023-11-28 04:40:49,013 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.03 vs. limit=15.0 2023-11-28 04:40:49,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3357313.3333333335, ans=0.125 2023-11-28 04:40:54,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3357313.3333333335, ans=0.2 2023-11-28 04:40:55,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3357313.3333333335, ans=0.125 2023-11-28 04:40:58,709 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3357380.0, ans=0.0 2023-11-28 04:40:59,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3357380.0, ans=0.125 2023-11-28 04:41:06,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3357380.0, ans=0.125 2023-11-28 04:41:07,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3357380.0, ans=0.125 2023-11-28 04:41:08,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3357446.6666666665, ans=0.1 2023-11-28 04:41:10,907 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.21 vs. limit=15.0 2023-11-28 04:41:17,114 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3357446.6666666665, ans=0.0 2023-11-28 04:41:21,506 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 10650, loss[loss=0.06058, simple_loss=0.08352, pruned_loss=0.01046, audio_tagging_loss=0.008352, over 14805.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.09029, pruned_loss=0.01238, audio_tagging_loss=0.008728, over 3050025.20 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:41:38,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3357580.0, ans=0.1 2023-11-28 04:41:46,317 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 503650 2023-11-28 04:41:50,695 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3357646.6666666665, ans=0.0 2023-11-28 04:42:20,168 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 10700, loss[loss=0.08784, simple_loss=0.1237, pruned_loss=0.01753, audio_tagging_loss=0.008448, over 15763.00 frames. ], tot_loss[loss=0.06706, simple_loss=0.09158, pruned_loss=0.01268, audio_tagging_loss=0.008594, over 3048189.65 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 8.0 2023-11-28 04:42:22,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3357846.6666666665, ans=0.1 2023-11-28 04:42:35,419 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.336e+01 8.497e+01 9.278e+01 9.975e+01 1.438e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-28 04:42:38,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3357913.3333333335, ans=10.0 2023-11-28 04:42:43,744 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 503700 2023-11-28 04:42:50,151 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3357980.0, ans=0.125 2023-11-28 04:43:16,268 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 10750, loss[loss=0.06866, simple_loss=0.09882, pruned_loss=0.0118, audio_tagging_loss=0.007445, over 14905.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.09051, pruned_loss=0.01244, audio_tagging_loss=0.00849, over 3049494.15 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 8.0 2023-11-28 04:43:40,944 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 503750 2023-11-28 04:43:49,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3358313.3333333335, ans=0.5 2023-11-28 04:43:50,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3358380.0, ans=0.0 2023-11-28 04:44:13,537 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 10800, loss[loss=0.05515, simple_loss=0.07062, pruned_loss=0.01082, audio_tagging_loss=0.009021, over 14797.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.09078, pruned_loss=0.01243, audio_tagging_loss=0.008421, over 3053559.01 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:44:14,392 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.14 vs. limit=15.0 2023-11-28 04:44:24,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3358580.0, ans=0.1 2023-11-28 04:44:30,561 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.579e+01 8.815e+01 9.428e+01 9.959e+01 1.276e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-28 04:44:35,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3358580.0, ans=0.09899494936611666 2023-11-28 04:44:38,283 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 503800 2023-11-28 04:44:41,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3358646.6666666665, ans=0.0 2023-11-28 04:44:44,483 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3358646.6666666665, ans=0.2 2023-11-28 04:44:48,724 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.62 vs. limit=6.0 2023-11-28 04:45:02,746 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3358780.0, ans=0.125 2023-11-28 04:45:03,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3358780.0, ans=0.125 2023-11-28 04:45:12,764 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 10850, loss[loss=0.07789, simple_loss=0.106, pruned_loss=0.01676, audio_tagging_loss=0.008119, over 15725.00 frames. ], tot_loss[loss=0.0658, simple_loss=0.09002, pruned_loss=0.01226, audio_tagging_loss=0.00853, over 3052461.15 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:45:25,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3358913.3333333335, ans=0.125 2023-11-28 04:45:25,124 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3358913.3333333335, ans=0.2 2023-11-28 04:45:36,426 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 503850 2023-11-28 04:45:48,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3359046.6666666665, ans=0.125 2023-11-28 04:46:09,856 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 10900, loss[loss=0.05928, simple_loss=0.08422, pruned_loss=0.00907, audio_tagging_loss=0.008098, over 15390.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.09048, pruned_loss=0.01236, audio_tagging_loss=0.008607, over 3049948.52 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:46:09,876 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 04:46:25,738 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.427e+01 8.788e+01 9.283e+01 9.844e+01 1.254e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-28 04:46:26,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3359246.6666666665, ans=0.125 2023-11-28 04:46:34,070 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 503900 2023-11-28 04:47:05,758 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.77 vs. limit=10.0 2023-11-28 04:47:07,425 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 10950, loss[loss=0.06764, simple_loss=0.0936, pruned_loss=0.01232, audio_tagging_loss=0.008517, over 15870.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.09036, pruned_loss=0.0123, audio_tagging_loss=0.008615, over 3049924.72 frames. ], batch size: 59, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:47:22,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3359580.0, ans=0.0 2023-11-28 04:47:31,945 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 503950 2023-11-28 04:47:42,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3359713.3333333335, ans=0.0 2023-11-28 04:47:42,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3359713.3333333335, ans=0.125 2023-11-28 04:47:54,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3359780.0, ans=0.2 2023-11-28 04:47:56,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3359780.0, ans=0.2 2023-11-28 04:48:05,132 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 11000, loss[loss=0.0662, simple_loss=0.08964, pruned_loss=0.01144, audio_tagging_loss=0.009942, over 15763.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.0905, pruned_loss=0.01227, audio_tagging_loss=0.008622, over 3046602.16 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:48:08,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3359846.6666666665, ans=0.09899494936611666 2023-11-28 04:48:17,923 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 04:48:18,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3359913.3333333335, ans=0.0 2023-11-28 04:48:21,135 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.126e+01 8.488e+01 9.034e+01 9.756e+01 1.163e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-28 04:48:26,108 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.70 vs. limit=12.0 2023-11-28 04:48:29,555 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 504000 2023-11-28 04:48:39,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3359980.0, ans=0.2 2023-11-28 04:48:40,343 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.51 vs. limit=12.0 2023-11-28 04:48:44,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3360046.6666666665, ans=0.125 2023-11-28 04:48:46,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3360046.6666666665, ans=0.125 2023-11-28 04:48:52,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3360113.3333333335, ans=0.125 2023-11-28 04:49:02,568 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.12 vs. limit=12.0 2023-11-28 04:49:05,290 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 11050, loss[loss=0.05807, simple_loss=0.0711, pruned_loss=0.01207, audio_tagging_loss=0.01045, over 15523.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.09072, pruned_loss=0.01251, audio_tagging_loss=0.008676, over 3043698.42 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:49:22,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3360246.6666666665, ans=0.0 2023-11-28 04:49:28,509 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 504050 2023-11-28 04:49:38,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3360380.0, ans=0.0 2023-11-28 04:49:39,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3360380.0, ans=0.125 2023-11-28 04:49:53,454 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.60 vs. limit=15.0 2023-11-28 04:50:02,378 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 11100, loss[loss=0.03082, simple_loss=0.03018, pruned_loss=0.005248, audio_tagging_loss=0.01048, over 15034.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.08975, pruned_loss=0.01242, audio_tagging_loss=0.008771, over 3043955.21 frames. ], batch size: 59, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:50:05,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3360513.3333333335, ans=0.07 2023-11-28 04:50:15,465 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.88 vs. limit=15.0 2023-11-28 04:50:18,572 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.046e+01 8.723e+01 9.489e+01 1.017e+02 2.061e+02, threshold=1.898e+02, percent-clipped=1.0 2023-11-28 04:50:25,711 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.57 vs. limit=15.0 2023-11-28 04:50:26,320 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 504100 2023-11-28 04:50:42,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3360713.3333333335, ans=0.0 2023-11-28 04:50:57,221 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 04:50:59,701 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 11150, loss[loss=0.08761, simple_loss=0.11, pruned_loss=0.02203, audio_tagging_loss=0.01056, over 14817.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.08935, pruned_loss=0.0123, audio_tagging_loss=0.008948, over 3039790.92 frames. ], batch size: 54, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:51:23,887 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 504150 2023-11-28 04:51:26,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3360980.0, ans=0.2 2023-11-28 04:51:42,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3361046.6666666665, ans=0.0 2023-11-28 04:51:43,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3361046.6666666665, ans=0.125 2023-11-28 04:51:57,692 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 11200, loss[loss=0.0718, simple_loss=0.08703, pruned_loss=0.01703, audio_tagging_loss=0.01125, over 14242.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.08972, pruned_loss=0.01261, audio_tagging_loss=0.009105, over 3044125.35 frames. ], batch size: 53, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 04:52:04,676 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3361180.0, ans=0.125 2023-11-28 04:52:13,620 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.794e+01 8.826e+01 9.324e+01 1.011e+02 1.372e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-28 04:52:21,314 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 504200 2023-11-28 04:52:28,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3361313.3333333335, ans=0.1 2023-11-28 04:52:31,986 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.35 vs. limit=15.0 2023-11-28 04:52:55,520 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 11250, loss[loss=0.07094, simple_loss=0.09851, pruned_loss=0.01282, audio_tagging_loss=0.008866, over 15082.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.08957, pruned_loss=0.01255, audio_tagging_loss=0.009075, over 3052189.54 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 04:53:02,278 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3361513.3333333335, ans=0.125 2023-11-28 04:53:03,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3361513.3333333335, ans=0.125 2023-11-28 04:53:12,565 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.45 vs. limit=15.0 2023-11-28 04:53:18,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3361646.6666666665, ans=0.125 2023-11-28 04:53:19,171 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 504250 2023-11-28 04:53:20,476 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3361646.6666666665, ans=0.125 2023-11-28 04:53:40,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3361780.0, ans=0.0 2023-11-28 04:53:50,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3361846.6666666665, ans=0.125 2023-11-28 04:53:52,337 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 11300, loss[loss=0.06181, simple_loss=0.09092, pruned_loss=0.007955, audio_tagging_loss=0.008389, over 14511.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.08965, pruned_loss=0.01248, audio_tagging_loss=0.008967, over 3054261.67 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:54:04,107 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 04:54:06,271 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3361913.3333333335, ans=0.125 2023-11-28 04:54:09,277 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.179e+01 8.810e+01 9.312e+01 1.008e+02 1.209e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-28 04:54:09,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3361913.3333333335, ans=0.07 2023-11-28 04:54:16,576 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 504300 2023-11-28 04:54:23,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3361980.0, ans=0.125 2023-11-28 04:54:48,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3362113.3333333335, ans=0.1 2023-11-28 04:54:50,072 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 11350, loss[loss=0.06705, simple_loss=0.09477, pruned_loss=0.01093, audio_tagging_loss=0.008735, over 14651.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.08955, pruned_loss=0.01247, audio_tagging_loss=0.008883, over 3053551.10 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:54:55,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3362180.0, ans=0.0 2023-11-28 04:55:03,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3362246.6666666665, ans=0.1 2023-11-28 04:55:11,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3362246.6666666665, ans=0.125 2023-11-28 04:55:14,395 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 504350 2023-11-28 04:55:37,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3362446.6666666665, ans=0.1 2023-11-28 04:55:37,681 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.63 vs. limit=15.0 2023-11-28 04:55:48,174 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 11400, loss[loss=0.08012, simple_loss=0.1128, pruned_loss=0.01799, audio_tagging_loss=0.005723, over 15025.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.09002, pruned_loss=0.01252, audio_tagging_loss=0.008716, over 3048289.94 frames. ], batch size: 53, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:55:52,719 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3362513.3333333335, ans=0.0 2023-11-28 04:55:55,114 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3362513.3333333335, ans=0.0 2023-11-28 04:56:05,107 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.040e+01 8.951e+01 9.530e+01 1.041e+02 1.873e+02, threshold=1.906e+02, percent-clipped=1.0 2023-11-28 04:56:12,159 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 504400 2023-11-28 04:56:45,798 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 11450, loss[loss=0.08404, simple_loss=0.1087, pruned_loss=0.02163, audio_tagging_loss=0.008082, over 15627.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.08992, pruned_loss=0.01273, audio_tagging_loss=0.008668, over 3045607.62 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 8.0 2023-11-28 04:56:51,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3362846.6666666665, ans=0.0 2023-11-28 04:57:09,846 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 504450 2023-11-28 04:57:12,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3362980.0, ans=0.1 2023-11-28 04:57:23,377 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3363046.6666666665, ans=0.1 2023-11-28 04:57:27,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3363046.6666666665, ans=0.125 2023-11-28 04:57:41,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3363113.3333333335, ans=0.125 2023-11-28 04:57:43,782 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 11500, loss[loss=0.0724, simple_loss=0.09551, pruned_loss=0.01668, audio_tagging_loss=0.007973, over 16716.00 frames. ], tot_loss[loss=0.06648, simple_loss=0.09003, pruned_loss=0.01272, audio_tagging_loss=0.008743, over 3050164.66 frames. ], batch size: 64, lr: 1.60e-03, grad_scale: 8.0 2023-11-28 04:57:46,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3363180.0, ans=0.125 2023-11-28 04:58:02,628 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.321e+01 8.810e+01 9.465e+01 1.017e+02 1.248e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-28 04:58:08,108 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 504500 2023-11-28 04:58:14,131 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.14 vs. limit=6.0 2023-11-28 04:58:23,334 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3363380.0, ans=0.1 2023-11-28 04:58:36,777 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.31 vs. limit=15.0 2023-11-28 04:58:40,740 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 11550, loss[loss=0.0675, simple_loss=0.09483, pruned_loss=0.0114, audio_tagging_loss=0.008685, over 15237.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.09025, pruned_loss=0.01271, audio_tagging_loss=0.008668, over 3054681.00 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 8.0 2023-11-28 04:59:00,703 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3363580.0, ans=0.0 2023-11-28 04:59:05,979 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 504550 2023-11-28 04:59:07,169 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3363646.6666666665, ans=0.125 2023-11-28 04:59:11,594 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3363646.6666666665, ans=0.0 2023-11-28 04:59:19,026 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 04:59:24,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3363713.3333333335, ans=0.2 2023-11-28 04:59:27,373 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3363780.0, ans=0.0 2023-11-28 04:59:31,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3363780.0, ans=0.1 2023-11-28 04:59:35,091 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.94 vs. limit=15.0 2023-11-28 04:59:38,809 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 11600, loss[loss=0.06514, simple_loss=0.08456, pruned_loss=0.01196, audio_tagging_loss=0.01091, over 15097.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.09033, pruned_loss=0.0127, audio_tagging_loss=0.008703, over 3049663.23 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:59:42,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3363846.6666666665, ans=0.0 2023-11-28 04:59:57,182 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.924e+01 8.620e+01 9.416e+01 1.017e+02 1.407e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-28 05:00:02,714 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 504600 2023-11-28 05:00:06,522 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3363980.0, ans=0.125 2023-11-28 05:00:21,958 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3364046.6666666665, ans=0.125 2023-11-28 05:00:26,665 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.76 vs. limit=22.5 2023-11-28 05:00:27,726 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.80 vs. limit=15.0 2023-11-28 05:00:31,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3364113.3333333335, ans=0.0 2023-11-28 05:00:36,728 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 11650, loss[loss=0.06847, simple_loss=0.09959, pruned_loss=0.01149, audio_tagging_loss=0.007187, over 14705.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.09029, pruned_loss=0.01247, audio_tagging_loss=0.008594, over 3049271.43 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 05:00:47,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3364246.6666666665, ans=0.0 2023-11-28 05:00:57,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3364246.6666666665, ans=0.2 2023-11-28 05:01:01,217 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 504650 2023-11-28 05:01:12,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3364380.0, ans=0.0 2023-11-28 05:01:16,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3364380.0, ans=0.2 2023-11-28 05:01:21,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3364446.6666666665, ans=0.125 2023-11-28 05:01:30,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3364446.6666666665, ans=0.125 2023-11-28 05:01:33,592 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 11700, loss[loss=0.0701, simple_loss=0.103, pruned_loss=0.01036, audio_tagging_loss=0.008207, over 15079.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.09011, pruned_loss=0.01238, audio_tagging_loss=0.008611, over 3055405.30 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 05:01:42,461 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.94 vs. limit=22.5 2023-11-28 05:01:48,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3364580.0, ans=0.125 2023-11-28 05:01:52,252 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.238e+01 8.810e+01 9.366e+01 1.007e+02 1.398e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-28 05:01:58,275 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 504700 2023-11-28 05:02:11,331 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.27 vs. limit=22.5 2023-11-28 05:02:26,082 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.86 vs. limit=15.0 2023-11-28 05:02:29,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3364780.0, ans=0.1 2023-11-28 05:02:31,530 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 11750, loss[loss=0.05534, simple_loss=0.078, pruned_loss=0.007738, audio_tagging_loss=0.0086, over 14747.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.0908, pruned_loss=0.01254, audio_tagging_loss=0.008533, over 3053540.52 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 05:02:44,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3364913.3333333335, ans=0.125 2023-11-28 05:02:49,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3364913.3333333335, ans=0.0 2023-11-28 05:02:55,554 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 504750 2023-11-28 05:03:18,993 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3365113.3333333335, ans=0.125 2023-11-28 05:03:19,246 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.06 vs. limit=12.0 2023-11-28 05:03:28,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3365180.0, ans=0.125 2023-11-28 05:03:29,544 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 11800, loss[loss=0.08288, simple_loss=0.1082, pruned_loss=0.02285, audio_tagging_loss=0.005955, over 15439.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.09068, pruned_loss=0.01262, audio_tagging_loss=0.008584, over 3046027.36 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 05:03:33,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3365180.0, ans=0.1 2023-11-28 05:03:47,023 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.284e+01 8.611e+01 9.542e+01 1.045e+02 1.429e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-28 05:03:49,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3365246.6666666665, ans=0.2 2023-11-28 05:03:50,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3365246.6666666665, ans=0.125 2023-11-28 05:03:53,112 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 504800 2023-11-28 05:03:56,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3365313.3333333335, ans=0.0 2023-11-28 05:04:12,646 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3365380.0, ans=0.0 2023-11-28 05:04:23,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3365446.6666666665, ans=0.0 2023-11-28 05:04:23,598 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3365446.6666666665, ans=0.125 2023-11-28 05:04:26,613 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 11850, loss[loss=0.07258, simple_loss=0.09665, pruned_loss=0.01539, audio_tagging_loss=0.00886, over 15223.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.0901, pruned_loss=0.01237, audio_tagging_loss=0.008755, over 3043973.96 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 05:04:30,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3365513.3333333335, ans=0.2 2023-11-28 05:04:40,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3365580.0, ans=0.0 2023-11-28 05:04:43,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3365580.0, ans=0.1 2023-11-28 05:04:51,179 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 504850 2023-11-28 05:05:03,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3365713.3333333335, ans=0.125 2023-11-28 05:05:08,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3365713.3333333335, ans=0.125 2023-11-28 05:05:09,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3365713.3333333335, ans=0.125 2023-11-28 05:05:11,254 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.75 vs. limit=22.5 2023-11-28 05:05:13,414 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.75 vs. limit=15.0 2023-11-28 05:05:24,489 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 11900, loss[loss=0.07262, simple_loss=0.0982, pruned_loss=0.01437, audio_tagging_loss=0.009149, over 14643.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.09024, pruned_loss=0.01224, audio_tagging_loss=0.008862, over 3041994.96 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 05:05:32,475 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.82 vs. limit=22.5 2023-11-28 05:05:34,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3365846.6666666665, ans=0.0 2023-11-28 05:05:43,452 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.056e+01 8.747e+01 9.488e+01 1.023e+02 1.658e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-28 05:05:45,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3365913.3333333335, ans=0.125 2023-11-28 05:05:49,029 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 504900 2023-11-28 05:05:53,505 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3365980.0, ans=0.0 2023-11-28 05:05:57,003 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.65 vs. limit=15.0 2023-11-28 05:06:14,007 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3366113.3333333335, ans=0.0 2023-11-28 05:06:20,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3366113.3333333335, ans=0.035 2023-11-28 05:06:23,040 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 11950, loss[loss=0.061, simple_loss=0.07888, pruned_loss=0.01264, audio_tagging_loss=0.008918, over 14292.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.08962, pruned_loss=0.01212, audio_tagging_loss=0.008931, over 3042293.91 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 05:06:26,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3366180.0, ans=0.0 2023-11-28 05:06:46,936 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 504950 2023-11-28 05:06:49,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3366313.3333333335, ans=0.2 2023-11-28 05:06:49,347 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 05:06:49,529 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.16 vs. limit=22.5 2023-11-28 05:06:54,644 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 05:07:04,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3366380.0, ans=0.125 2023-11-28 05:07:07,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3366446.6666666665, ans=0.1 2023-11-28 05:07:10,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3366446.6666666665, ans=0.5 2023-11-28 05:07:10,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3366446.6666666665, ans=0.0 2023-11-28 05:07:12,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3366446.6666666665, ans=0.1 2023-11-28 05:07:19,268 INFO [train_asr.py:1235] (1/4) Epoch 42, batch 12000, loss[loss=0.08571, simple_loss=0.1168, pruned_loss=0.01736, audio_tagging_loss=0.009955, over 15047.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.08999, pruned_loss=0.01215, audio_tagging_loss=0.009015, over 3051187.22 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 05:07:19,269 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-28 05:07:32,296 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.4613, 3.4336, 3.7616, 3.6358], device='cuda:1') 2023-11-28 05:07:54,226 INFO [train_asr.py:1267] (1/4) Epoch 42, validation: loss=0.05822, simple_loss=0.05066, pruned_loss=0.005316, audio_tagging_loss=0.02757, over 4681554.00 frames. 2023-11-28 05:07:54,227 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-28 05:08:09,852 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.26 vs. limit=12.0 2023-11-28 05:08:11,286 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.786e+01 8.775e+01 9.473e+01 1.010e+02 1.187e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-28 05:08:13,994 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.17 vs. limit=12.0 2023-11-28 05:08:16,478 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 505000 2023-11-28 05:08:35,702 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 0, loss[loss=0.07427, simple_loss=0.08098, pruned_loss=0.01288, audio_tagging_loss=0.0209, over 15048.00 frames. ], tot_loss[loss=0.07427, simple_loss=0.08098, pruned_loss=0.01288, audio_tagging_loss=0.0209, over 15048.00 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:08:35,702 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-28 05:08:50,574 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.7493, 5.5330, 5.1729, 5.2573], device='cuda:1') 2023-11-28 05:09:10,067 INFO [train_asr.py:1267] (1/4) Epoch 43, validation: loss=0.05773, simple_loss=0.0506, pruned_loss=0.005225, audio_tagging_loss=0.0272, over 4681554.00 frames. 2023-11-28 05:09:10,068 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-28 05:09:28,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3366740.0, ans=0.0 2023-11-28 05:09:41,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3366806.6666666665, ans=0.125 2023-11-28 05:09:57,928 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.14 vs. limit=6.0 2023-11-28 05:10:04,092 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 505050 2023-11-28 05:10:07,278 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 50, loss[loss=0.05739, simple_loss=0.06133, pruned_loss=0.005504, audio_tagging_loss=0.02122, over 15047.00 frames. ], tot_loss[loss=0.07295, simple_loss=0.08853, pruned_loss=0.01185, audio_tagging_loss=0.01684, over 685217.20 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:10:20,743 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3367073.3333333335, ans=0.5 2023-11-28 05:10:24,617 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 05:10:24,738 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3367073.3333333335, ans=0.125 2023-11-28 05:10:27,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3367073.3333333335, ans=0.04949747468305833 2023-11-28 05:10:49,637 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.19 vs. limit=10.0 2023-11-28 05:10:56,700 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.375e+01 9.586e+01 1.037e+02 1.129e+02 1.417e+02, threshold=2.074e+02, percent-clipped=0.0 2023-11-28 05:11:01,180 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 505100 2023-11-28 05:11:04,369 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 100, loss[loss=0.08411, simple_loss=0.1175, pruned_loss=0.01465, audio_tagging_loss=0.0107, over 14797.00 frames. ], tot_loss[loss=0.07318, simple_loss=0.09046, pruned_loss=0.01208, audio_tagging_loss=0.01587, over 1209685.11 frames. ], batch size: 54, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:11:23,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3367406.6666666665, ans=0.0 2023-11-28 05:11:44,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3367540.0, ans=0.0 2023-11-28 05:11:52,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3367606.6666666665, ans=10.0 2023-11-28 05:11:52,427 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3367606.6666666665, ans=0.125 2023-11-28 05:11:55,720 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3367606.6666666665, ans=0.1 2023-11-28 05:11:57,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3367606.6666666665, ans=0.2 2023-11-28 05:11:58,748 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 505150 2023-11-28 05:12:02,499 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 150, loss[loss=0.04232, simple_loss=0.05022, pruned_loss=0.00378, audio_tagging_loss=0.01344, over 16880.00 frames. ], tot_loss[loss=0.07096, simple_loss=0.08938, pruned_loss=0.01189, audio_tagging_loss=0.01438, over 1624031.56 frames. ], batch size: 66, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:12:05,430 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.91 vs. limit=22.5 2023-11-28 05:12:19,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3367740.0, ans=0.125 2023-11-28 05:12:42,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3367873.3333333335, ans=0.0 2023-11-28 05:12:42,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3367873.3333333335, ans=0.1 2023-11-28 05:12:48,941 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.12 vs. limit=15.0 2023-11-28 05:12:50,084 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3367940.0, ans=0.07 2023-11-28 05:12:51,922 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=3367940.0, ans=10.0 2023-11-28 05:12:52,803 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.165e+01 9.055e+01 9.611e+01 1.032e+02 1.243e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-28 05:12:57,266 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 505200 2023-11-28 05:12:58,828 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.97 vs. limit=6.0 2023-11-28 05:13:01,125 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 200, loss[loss=0.05506, simple_loss=0.06677, pruned_loss=0.009813, audio_tagging_loss=0.01186, over 15807.00 frames. ], tot_loss[loss=0.06928, simple_loss=0.08939, pruned_loss=0.01187, audio_tagging_loss=0.01271, over 1942335.46 frames. ], batch size: 63, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:13:11,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3368073.3333333335, ans=0.125 2023-11-28 05:13:17,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3368073.3333333335, ans=0.125 2023-11-28 05:13:18,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3368073.3333333335, ans=0.125 2023-11-28 05:13:21,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3368073.3333333335, ans=0.1 2023-11-28 05:13:28,250 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3368140.0, ans=0.05 2023-11-28 05:13:43,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3368206.6666666665, ans=0.0 2023-11-28 05:13:54,400 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 505250 2023-11-28 05:13:57,689 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 250, loss[loss=0.04326, simple_loss=0.04866, pruned_loss=0.008571, audio_tagging_loss=0.01036, over 14912.00 frames. ], tot_loss[loss=0.0685, simple_loss=0.08974, pruned_loss=0.01215, audio_tagging_loss=0.01149, over 2183510.21 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 8.0 2023-11-28 05:14:06,924 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3368340.0, ans=0.0 2023-11-28 05:14:22,666 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3368473.3333333335, ans=0.0 2023-11-28 05:14:23,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3368473.3333333335, ans=0.125 2023-11-28 05:14:24,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3368473.3333333335, ans=0.0 2023-11-28 05:14:48,350 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.828e+01 9.024e+01 9.625e+01 1.027e+02 1.223e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-28 05:14:51,705 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 505300 2023-11-28 05:14:55,470 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 300, loss[loss=0.05656, simple_loss=0.07349, pruned_loss=0.007926, audio_tagging_loss=0.01189, over 14439.00 frames. ], tot_loss[loss=0.06829, simple_loss=0.09058, pruned_loss=0.01239, audio_tagging_loss=0.01061, over 2385629.52 frames. ], batch size: 53, lr: 1.58e-03, grad_scale: 8.0 2023-11-28 05:15:03,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3368673.3333333335, ans=0.125 2023-11-28 05:15:10,171 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.01 vs. limit=15.0 2023-11-28 05:15:17,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3368806.6666666665, ans=0.0 2023-11-28 05:15:22,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3368806.6666666665, ans=0.125 2023-11-28 05:15:24,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3368806.6666666665, ans=0.0 2023-11-28 05:15:34,353 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3368873.3333333335, ans=0.0 2023-11-28 05:15:49,200 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 505350 2023-11-28 05:15:52,955 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 350, loss[loss=0.06545, simple_loss=0.09764, pruned_loss=0.0118, audio_tagging_loss=0.004824, over 15034.00 frames. ], tot_loss[loss=0.06834, simple_loss=0.09135, pruned_loss=0.0126, audio_tagging_loss=0.01006, over 2533684.92 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 8.0 2023-11-28 05:15:59,649 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3369006.6666666665, ans=0.125 2023-11-28 05:16:02,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3369006.6666666665, ans=0.125 2023-11-28 05:16:03,368 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.13 vs. limit=10.0 2023-11-28 05:16:36,401 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.51 vs. limit=6.0 2023-11-28 05:16:42,842 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.468e+01 8.904e+01 9.500e+01 1.023e+02 1.547e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-28 05:16:46,292 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 505400 2023-11-28 05:16:49,840 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 400, loss[loss=0.04963, simple_loss=0.06465, pruned_loss=0.007294, audio_tagging_loss=0.01001, over 15896.00 frames. ], tot_loss[loss=0.06775, simple_loss=0.09066, pruned_loss=0.0126, audio_tagging_loss=0.009819, over 2646377.37 frames. ], batch size: 60, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:17:04,775 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3369406.6666666665, ans=0.2 2023-11-28 05:17:27,040 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.65 vs. limit=15.0 2023-11-28 05:17:43,288 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 505450 2023-11-28 05:17:46,364 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 450, loss[loss=0.03732, simple_loss=0.0506, pruned_loss=0.003374, audio_tagging_loss=0.008643, over 14788.00 frames. ], tot_loss[loss=0.06792, simple_loss=0.0913, pruned_loss=0.01274, audio_tagging_loss=0.009524, over 2738342.25 frames. ], batch size: 59, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:17:51,730 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.00 vs. limit=15.0 2023-11-28 05:18:05,418 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.55 vs. limit=10.0 2023-11-28 05:18:07,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3369740.0, ans=0.125 2023-11-28 05:18:16,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3369806.6666666665, ans=0.0 2023-11-28 05:18:26,275 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.49 vs. limit=15.0 2023-11-28 05:18:28,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3369873.3333333335, ans=0.0 2023-11-28 05:18:37,193 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.908e+01 8.667e+01 9.242e+01 1.003e+02 1.378e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-28 05:18:38,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3369940.0, ans=0.125 2023-11-28 05:18:41,113 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 505500 2023-11-28 05:18:44,347 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 500, loss[loss=0.07601, simple_loss=0.1127, pruned_loss=0.01063, audio_tagging_loss=0.009054, over 15214.00 frames. ], tot_loss[loss=0.0675, simple_loss=0.09118, pruned_loss=0.01265, audio_tagging_loss=0.009254, over 2808094.48 frames. ], batch size: 54, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:18:44,522 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3370006.6666666665, ans=0.1 2023-11-28 05:18:46,790 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 05:19:01,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3370073.3333333335, ans=0.125 2023-11-28 05:19:20,422 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.70 vs. limit=15.0 2023-11-28 05:19:24,927 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 05:19:24,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3370206.6666666665, ans=0.2 2023-11-28 05:19:29,359 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3370273.3333333335, ans=0.1 2023-11-28 05:19:38,575 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 505550 2023-11-28 05:19:41,719 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 550, loss[loss=0.06735, simple_loss=0.09634, pruned_loss=0.01216, audio_tagging_loss=0.007022, over 15317.00 frames. ], tot_loss[loss=0.06713, simple_loss=0.09084, pruned_loss=0.01252, audio_tagging_loss=0.009183, over 2860091.08 frames. ], batch size: 59, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:19:53,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3370406.6666666665, ans=0.2 2023-11-28 05:20:02,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3370406.6666666665, ans=0.125 2023-11-28 05:20:14,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3370473.3333333335, ans=0.0 2023-11-28 05:20:15,453 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3370540.0, ans=0.0 2023-11-28 05:20:16,427 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3370540.0, ans=0.2 2023-11-28 05:20:17,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3370540.0, ans=0.2 2023-11-28 05:20:20,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3370540.0, ans=0.125 2023-11-28 05:20:31,358 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3370606.6666666665, ans=0.125 2023-11-28 05:20:32,174 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.439e+01 9.076e+01 9.606e+01 1.009e+02 1.464e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-28 05:20:35,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3370606.6666666665, ans=0.125 2023-11-28 05:20:36,124 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 505600 2023-11-28 05:20:39,342 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.80 vs. limit=15.0 2023-11-28 05:20:39,635 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 600, loss[loss=0.05589, simple_loss=0.07675, pruned_loss=0.008571, audio_tagging_loss=0.008945, over 14912.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.09022, pruned_loss=0.01248, audio_tagging_loss=0.009104, over 2899858.16 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:20:43,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3370673.3333333335, ans=0.2 2023-11-28 05:21:01,913 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3370806.6666666665, ans=0.125 2023-11-28 05:21:18,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3370873.3333333335, ans=0.1 2023-11-28 05:21:25,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3370940.0, ans=0.1 2023-11-28 05:21:26,620 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.04 vs. limit=6.0 2023-11-28 05:21:34,493 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 505650 2023-11-28 05:21:37,653 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 650, loss[loss=0.05616, simple_loss=0.06696, pruned_loss=0.01168, audio_tagging_loss=0.01099, over 14295.00 frames. ], tot_loss[loss=0.06734, simple_loss=0.09124, pruned_loss=0.01275, audio_tagging_loss=0.008972, over 2925580.06 frames. ], batch size: 54, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:21:37,814 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3371006.6666666665, ans=0.0 2023-11-28 05:21:37,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3371006.6666666665, ans=0.125 2023-11-28 05:21:39,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3371006.6666666665, ans=0.2 2023-11-28 05:21:39,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3371006.6666666665, ans=0.0 2023-11-28 05:21:44,925 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.04 vs. limit=15.0 2023-11-28 05:21:48,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3371073.3333333335, ans=0.125 2023-11-28 05:22:00,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3371140.0, ans=0.1 2023-11-28 05:22:06,445 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.83 vs. limit=15.0 2023-11-28 05:22:28,539 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.551e+01 8.720e+01 9.285e+01 9.863e+01 1.198e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-28 05:22:31,980 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 505700 2023-11-28 05:22:35,224 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 700, loss[loss=0.06083, simple_loss=0.08366, pruned_loss=0.01174, audio_tagging_loss=0.007265, over 15536.00 frames. ], tot_loss[loss=0.06723, simple_loss=0.09125, pruned_loss=0.01266, audio_tagging_loss=0.008946, over 2956866.64 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:22:36,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3371340.0, ans=0.0 2023-11-28 05:22:42,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3371340.0, ans=0.1 2023-11-28 05:22:42,801 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.05 vs. limit=15.0 2023-11-28 05:22:53,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3371406.6666666665, ans=0.0 2023-11-28 05:23:18,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3371540.0, ans=0.2 2023-11-28 05:23:30,038 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 505750 2023-11-28 05:23:33,238 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 750, loss[loss=0.06596, simple_loss=0.09892, pruned_loss=0.009138, audio_tagging_loss=0.007366, over 15482.00 frames. ], tot_loss[loss=0.06696, simple_loss=0.0909, pruned_loss=0.01257, audio_tagging_loss=0.00894, over 2983641.62 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:23:51,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3371740.0, ans=0.0 2023-11-28 05:24:01,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3371806.6666666665, ans=0.125 2023-11-28 05:24:06,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3371873.3333333335, ans=0.1 2023-11-28 05:24:06,558 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.95 vs. limit=10.0 2023-11-28 05:24:07,548 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3371873.3333333335, ans=0.0 2023-11-28 05:24:10,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3371873.3333333335, ans=0.1 2023-11-28 05:24:15,062 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 05:24:17,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3371873.3333333335, ans=0.125 2023-11-28 05:24:24,148 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.201e+01 8.959e+01 9.414e+01 9.993e+01 1.273e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-28 05:24:26,777 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3371940.0, ans=0.125 2023-11-28 05:24:27,575 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 505800 2023-11-28 05:24:31,330 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 800, loss[loss=0.06252, simple_loss=0.09397, pruned_loss=0.009121, audio_tagging_loss=0.006418, over 14301.00 frames. ], tot_loss[loss=0.0668, simple_loss=0.09075, pruned_loss=0.01252, audio_tagging_loss=0.00891, over 2999156.67 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:24:34,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3372006.6666666665, ans=0.125 2023-11-28 05:24:48,088 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3372073.3333333335, ans=0.125 2023-11-28 05:24:56,441 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3372140.0, ans=0.125 2023-11-28 05:25:04,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3372206.6666666665, ans=0.2 2023-11-28 05:25:08,114 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3372206.6666666665, ans=0.125 2023-11-28 05:25:24,820 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 505850 2023-11-28 05:25:28,128 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 850, loss[loss=0.06625, simple_loss=0.0906, pruned_loss=0.01215, audio_tagging_loss=0.008794, over 15031.00 frames. ], tot_loss[loss=0.06702, simple_loss=0.09099, pruned_loss=0.01259, audio_tagging_loss=0.008933, over 3013580.70 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:25:49,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3372406.6666666665, ans=0.04949747468305833 2023-11-28 05:25:57,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3372473.3333333335, ans=0.1 2023-11-28 05:26:01,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3372473.3333333335, ans=0.2 2023-11-28 05:26:16,436 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3372606.6666666665, ans=0.125 2023-11-28 05:26:18,488 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.788e+01 8.784e+01 9.411e+01 9.995e+01 2.932e+02, threshold=1.882e+02, percent-clipped=1.0 2023-11-28 05:26:21,841 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 505900 2023-11-28 05:26:26,160 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 900, loss[loss=0.06613, simple_loss=0.08382, pruned_loss=0.0133, audio_tagging_loss=0.01092, over 15795.00 frames. ], tot_loss[loss=0.06683, simple_loss=0.09079, pruned_loss=0.01254, audio_tagging_loss=0.008898, over 3018662.38 frames. ], batch size: 59, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:26:29,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3372673.3333333335, ans=0.04949747468305833 2023-11-28 05:26:34,561 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.06 vs. limit=15.0 2023-11-28 05:26:46,083 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3372740.0, ans=0.125 2023-11-28 05:26:52,832 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.48 vs. limit=6.0 2023-11-28 05:26:53,732 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.77 vs. limit=15.0 2023-11-28 05:27:19,709 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 505950 2023-11-28 05:27:23,368 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 950, loss[loss=0.06532, simple_loss=0.08324, pruned_loss=0.01328, audio_tagging_loss=0.01041, over 16001.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.09039, pruned_loss=0.01235, audio_tagging_loss=0.00882, over 3023916.31 frames. ], batch size: 61, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:27:24,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3373006.6666666665, ans=0.1 2023-11-28 05:27:25,053 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.64 vs. limit=22.5 2023-11-28 05:27:25,910 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3373006.6666666665, ans=0.2 2023-11-28 05:27:40,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3373073.3333333335, ans=0.1 2023-11-28 05:27:45,541 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3373140.0, ans=0.125 2023-11-28 05:28:00,893 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3373206.6666666665, ans=0.125 2023-11-28 05:28:08,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3373273.3333333335, ans=0.125 2023-11-28 05:28:14,047 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.648e+01 8.684e+01 9.471e+01 1.027e+02 1.244e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-28 05:28:17,488 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 506000 2023-11-28 05:28:21,350 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 1000, loss[loss=0.05539, simple_loss=0.07527, pruned_loss=0.009287, audio_tagging_loss=0.008469, over 15558.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.09027, pruned_loss=0.0124, audio_tagging_loss=0.008689, over 3023983.75 frames. ], batch size: 59, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:28:44,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3373473.3333333335, ans=0.0 2023-11-28 05:28:47,931 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 05:29:08,398 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3373606.6666666665, ans=0.125 2023-11-28 05:29:08,846 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.92 vs. limit=10.0 2023-11-28 05:29:14,897 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 506050 2023-11-28 05:29:18,177 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 1050, loss[loss=0.06947, simple_loss=0.1051, pruned_loss=0.01117, audio_tagging_loss=0.00574, over 15936.00 frames. ], tot_loss[loss=0.06676, simple_loss=0.09125, pruned_loss=0.01252, audio_tagging_loss=0.008611, over 3037361.52 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:29:20,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3373673.3333333335, ans=0.1 2023-11-28 05:29:20,454 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.22 vs. limit=22.5 2023-11-28 05:29:27,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3373673.3333333335, ans=0.1 2023-11-28 05:29:31,536 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.37 vs. limit=15.0 2023-11-28 05:29:39,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3373740.0, ans=0.2 2023-11-28 05:29:53,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3373873.3333333335, ans=0.2 2023-11-28 05:29:54,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3373873.3333333335, ans=0.0 2023-11-28 05:30:00,828 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3373873.3333333335, ans=0.125 2023-11-28 05:30:00,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3373873.3333333335, ans=0.125 2023-11-28 05:30:08,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3373940.0, ans=0.125 2023-11-28 05:30:09,119 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.815e+01 8.822e+01 9.430e+01 1.008e+02 1.221e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-28 05:30:13,102 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 506100 2023-11-28 05:30:16,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3374006.6666666665, ans=0.1 2023-11-28 05:30:16,835 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 1100, loss[loss=0.07663, simple_loss=0.1102, pruned_loss=0.01367, audio_tagging_loss=0.007872, over 13960.00 frames. ], tot_loss[loss=0.06644, simple_loss=0.09053, pruned_loss=0.01247, audio_tagging_loss=0.008704, over 3034513.22 frames. ], batch size: 50, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:30:21,732 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 05:30:28,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3374073.3333333335, ans=0.2 2023-11-28 05:30:30,816 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3374073.3333333335, ans=0.125 2023-11-28 05:30:51,546 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3374206.6666666665, ans=0.125 2023-11-28 05:30:57,291 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3374206.6666666665, ans=0.1 2023-11-28 05:31:02,668 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.98 vs. limit=15.0 2023-11-28 05:31:11,350 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 506150 2023-11-28 05:31:14,621 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 1150, loss[loss=0.0558, simple_loss=0.07752, pruned_loss=0.01117, audio_tagging_loss=0.005866, over 15533.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.08991, pruned_loss=0.01231, audio_tagging_loss=0.008704, over 3038989.43 frames. ], batch size: 60, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:31:22,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3374340.0, ans=0.5 2023-11-28 05:31:26,242 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.44 vs. limit=15.0 2023-11-28 05:31:26,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3374406.6666666665, ans=0.125 2023-11-28 05:32:06,089 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.453e+01 8.704e+01 9.429e+01 9.950e+01 1.461e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-28 05:32:08,344 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 506200 2023-11-28 05:32:11,948 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 1200, loss[loss=0.0891, simple_loss=0.1268, pruned_loss=0.01864, audio_tagging_loss=0.007044, over 15740.00 frames. ], tot_loss[loss=0.06644, simple_loss=0.09102, pruned_loss=0.0124, audio_tagging_loss=0.008532, over 3043038.25 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:32:24,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3374740.0, ans=0.0 2023-11-28 05:32:24,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3374740.0, ans=0.125 2023-11-28 05:32:33,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3374740.0, ans=0.125 2023-11-28 05:32:55,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3374873.3333333335, ans=0.015 2023-11-28 05:33:05,785 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 506250 2023-11-28 05:33:09,515 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 1250, loss[loss=0.0709, simple_loss=0.108, pruned_loss=0.01087, audio_tagging_loss=0.00605, over 15448.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.09034, pruned_loss=0.01222, audio_tagging_loss=0.008514, over 3051335.79 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:33:21,801 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 05:33:25,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3375073.3333333335, ans=0.2 2023-11-28 05:33:26,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3375073.3333333335, ans=0.125 2023-11-28 05:34:02,428 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.743e+01 8.863e+01 9.431e+01 1.030e+02 1.303e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-28 05:34:04,696 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 506300 2023-11-28 05:34:07,940 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 1300, loss[loss=0.04839, simple_loss=0.06506, pruned_loss=0.008262, audio_tagging_loss=0.007602, over 14513.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.08979, pruned_loss=0.01217, audio_tagging_loss=0.00855, over 3042491.08 frames. ], batch size: 55, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:34:23,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3375406.6666666665, ans=0.125 2023-11-28 05:34:40,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3375473.3333333335, ans=0.0 2023-11-28 05:34:41,806 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3375540.0, ans=0.125 2023-11-28 05:34:49,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3375540.0, ans=0.1 2023-11-28 05:35:01,566 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 506350 2023-11-28 05:35:04,842 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 1350, loss[loss=0.07098, simple_loss=0.08022, pruned_loss=0.01936, audio_tagging_loss=0.01151, over 14735.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.09093, pruned_loss=0.01231, audio_tagging_loss=0.00848, over 3042317.43 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:35:12,189 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3375673.3333333335, ans=0.0 2023-11-28 05:35:14,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3375673.3333333335, ans=0.125 2023-11-28 05:35:18,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3375740.0, ans=0.125 2023-11-28 05:35:35,937 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3375806.6666666665, ans=0.0 2023-11-28 05:35:41,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3375873.3333333335, ans=0.0 2023-11-28 05:35:41,556 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.34 vs. limit=15.0 2023-11-28 05:35:48,618 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 05:35:48,907 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3375873.3333333335, ans=0.0 2023-11-28 05:35:53,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3375940.0, ans=0.125 2023-11-28 05:35:53,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3375940.0, ans=0.125 2023-11-28 05:35:57,890 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.288e+01 8.640e+01 9.329e+01 1.009e+02 1.189e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-28 05:35:59,120 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 506400 2023-11-28 05:36:02,706 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 1400, loss[loss=0.06755, simple_loss=0.08689, pruned_loss=0.01329, audio_tagging_loss=0.01081, over 15424.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.09163, pruned_loss=0.0123, audio_tagging_loss=0.008552, over 3046751.35 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:36:17,752 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3376073.3333333335, ans=0.125 2023-11-28 05:36:17,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3376073.3333333335, ans=0.125 2023-11-28 05:36:30,406 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.19 vs. limit=15.0 2023-11-28 05:36:39,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3376206.6666666665, ans=0.0 2023-11-28 05:36:57,751 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 506450 2023-11-28 05:37:01,516 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 1450, loss[loss=0.08844, simple_loss=0.1217, pruned_loss=0.02137, audio_tagging_loss=0.006233, over 15730.00 frames. ], tot_loss[loss=0.06678, simple_loss=0.09183, pruned_loss=0.0123, audio_tagging_loss=0.008563, over 3046472.71 frames. ], batch size: 59, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:37:14,782 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3376406.6666666665, ans=0.2 2023-11-28 05:37:20,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3376406.6666666665, ans=0.0 2023-11-28 05:37:51,782 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3376606.6666666665, ans=0.1 2023-11-28 05:37:53,726 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.315e+01 8.627e+01 9.329e+01 1.021e+02 1.483e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-28 05:37:54,940 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 506500 2023-11-28 05:37:58,156 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 1500, loss[loss=0.07, simple_loss=0.0928, pruned_loss=0.01453, audio_tagging_loss=0.009069, over 14969.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.09145, pruned_loss=0.01232, audio_tagging_loss=0.008615, over 3041310.54 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:38:03,927 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3376673.3333333335, ans=0.1 2023-11-28 05:38:11,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3376740.0, ans=0.125 2023-11-28 05:38:28,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3376806.6666666665, ans=0.125 2023-11-28 05:38:35,868 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3376873.3333333335, ans=0.125 2023-11-28 05:38:48,193 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3376940.0, ans=0.2 2023-11-28 05:38:50,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3376940.0, ans=0.0 2023-11-28 05:38:52,932 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 506550 2023-11-28 05:38:56,147 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 1550, loss[loss=0.0567, simple_loss=0.07925, pruned_loss=0.007616, audio_tagging_loss=0.009462, over 16015.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.09062, pruned_loss=0.01217, audio_tagging_loss=0.008756, over 3039866.29 frames. ], batch size: 60, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:38:59,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3377006.6666666665, ans=0.125 2023-11-28 05:39:08,768 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.27 vs. limit=22.5 2023-11-28 05:39:24,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3377140.0, ans=0.05 2023-11-28 05:39:25,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3377140.0, ans=0.0 2023-11-28 05:39:29,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3377206.6666666665, ans=0.0 2023-11-28 05:39:49,867 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.333e+01 9.028e+01 9.506e+01 1.021e+02 1.396e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-28 05:39:51,073 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 506600 2023-11-28 05:39:54,701 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 1600, loss[loss=0.06865, simple_loss=0.08855, pruned_loss=0.0163, audio_tagging_loss=0.008068, over 14896.00 frames. ], tot_loss[loss=0.0663, simple_loss=0.0906, pruned_loss=0.01216, audio_tagging_loss=0.008841, over 3040533.53 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:40:04,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3377340.0, ans=0.125 2023-11-28 05:40:23,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3377473.3333333335, ans=0.0 2023-11-28 05:40:44,710 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.95 vs. limit=22.5 2023-11-28 05:40:49,100 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 506650 2023-11-28 05:40:52,370 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 1650, loss[loss=0.05496, simple_loss=0.07104, pruned_loss=0.01202, audio_tagging_loss=0.007426, over 14871.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.09049, pruned_loss=0.01228, audio_tagging_loss=0.008858, over 3036046.93 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:40:56,349 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.59 vs. limit=22.5 2023-11-28 05:41:08,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3377740.0, ans=0.0 2023-11-28 05:41:10,209 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.29 vs. limit=10.0 2023-11-28 05:41:43,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3377940.0, ans=0.0 2023-11-28 05:41:46,325 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.739e+01 8.798e+01 9.580e+01 1.024e+02 1.381e+02, threshold=1.916e+02, percent-clipped=0.0 2023-11-28 05:41:46,426 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 506700 2023-11-28 05:41:50,513 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 1700, loss[loss=0.0678, simple_loss=0.1003, pruned_loss=0.01084, audio_tagging_loss=0.006829, over 15751.00 frames. ], tot_loss[loss=0.06624, simple_loss=0.08992, pruned_loss=0.01235, audio_tagging_loss=0.008927, over 3049427.66 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:41:56,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3378006.6666666665, ans=0.1 2023-11-28 05:41:56,660 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.28 vs. limit=10.0 2023-11-28 05:42:01,332 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.16 vs. limit=15.0 2023-11-28 05:42:09,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3378073.3333333335, ans=0.125 2023-11-28 05:42:12,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3378140.0, ans=0.1 2023-11-28 05:42:16,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3378140.0, ans=0.125 2023-11-28 05:42:35,640 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3378273.3333333335, ans=0.125 2023-11-28 05:42:44,603 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 506750 2023-11-28 05:42:48,082 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.95 vs. limit=15.0 2023-11-28 05:42:48,370 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 1750, loss[loss=0.06363, simple_loss=0.1023, pruned_loss=0.006278, audio_tagging_loss=0.006193, over 15791.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.08911, pruned_loss=0.01214, audio_tagging_loss=0.008977, over 3046661.70 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:42:50,887 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3378340.0, ans=0.1 2023-11-28 05:43:02,760 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.39 vs. limit=15.0 2023-11-28 05:43:07,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3378406.6666666665, ans=0.0 2023-11-28 05:43:18,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3378473.3333333335, ans=0.125 2023-11-28 05:43:19,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3378473.3333333335, ans=0.125 2023-11-28 05:43:21,581 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3378540.0, ans=0.5 2023-11-28 05:43:25,669 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.69 vs. limit=15.0 2023-11-28 05:43:27,576 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 05:43:41,936 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.850e+01 8.622e+01 9.287e+01 9.848e+01 1.344e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-28 05:43:42,034 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 506800 2023-11-28 05:43:45,540 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 1800, loss[loss=0.05905, simple_loss=0.09083, pruned_loss=0.008299, audio_tagging_loss=0.005333, over 14396.00 frames. ], tot_loss[loss=0.06522, simple_loss=0.08874, pruned_loss=0.01193, audio_tagging_loss=0.00892, over 3038320.51 frames. ], batch size: 53, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:43:50,318 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.26 vs. limit=15.0 2023-11-28 05:43:57,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3378740.0, ans=0.0 2023-11-28 05:44:09,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3378806.6666666665, ans=0.125 2023-11-28 05:44:21,666 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3378873.3333333335, ans=0.125 2023-11-28 05:44:39,511 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 506850 2023-11-28 05:44:43,410 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 1850, loss[loss=0.07936, simple_loss=0.1105, pruned_loss=0.01551, audio_tagging_loss=0.008609, over 15024.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08849, pruned_loss=0.01203, audio_tagging_loss=0.008906, over 3040567.82 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:45:10,232 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.25 vs. limit=15.0 2023-11-28 05:45:37,548 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.611e+01 8.914e+01 9.536e+01 1.008e+02 1.259e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-28 05:45:37,650 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 506900 2023-11-28 05:45:41,369 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 1900, loss[loss=0.05596, simple_loss=0.06902, pruned_loss=0.01338, audio_tagging_loss=0.008074, over 16412.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.08909, pruned_loss=0.01225, audio_tagging_loss=0.008771, over 3043503.12 frames. ], batch size: 63, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:45:41,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3379340.0, ans=0.0 2023-11-28 05:46:01,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3379406.6666666665, ans=0.0 2023-11-28 05:46:04,533 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.44 vs. limit=22.5 2023-11-28 05:46:06,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3379473.3333333335, ans=0.125 2023-11-28 05:46:09,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3379473.3333333335, ans=0.125 2023-11-28 05:46:14,902 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.89 vs. limit=15.0 2023-11-28 05:46:27,730 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3379606.6666666665, ans=0.125 2023-11-28 05:46:35,288 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 506950 2023-11-28 05:46:37,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3379673.3333333335, ans=0.125 2023-11-28 05:46:38,585 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 1950, loss[loss=0.07235, simple_loss=0.09974, pruned_loss=0.01492, audio_tagging_loss=0.007552, over 15210.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.0893, pruned_loss=0.0122, audio_tagging_loss=0.008613, over 3045157.95 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:46:45,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3379673.3333333335, ans=0.125 2023-11-28 05:46:51,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3379740.0, ans=0.125 2023-11-28 05:47:25,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3379940.0, ans=0.125 2023-11-28 05:47:30,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3379940.0, ans=0.125 2023-11-28 05:47:32,642 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.564e+01 8.861e+01 9.415e+01 1.012e+02 1.225e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-28 05:47:32,749 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 507000 2023-11-28 05:47:36,945 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 2000, loss[loss=0.06762, simple_loss=0.09289, pruned_loss=0.0125, audio_tagging_loss=0.008673, over 14586.00 frames. ], tot_loss[loss=0.06498, simple_loss=0.08841, pruned_loss=0.01208, audio_tagging_loss=0.008689, over 3042446.41 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:47:57,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3380073.3333333335, ans=0.07 2023-11-28 05:48:01,541 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3380140.0, ans=0.05 2023-11-28 05:48:08,606 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.97 vs. limit=15.0 2023-11-28 05:48:13,953 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3380206.6666666665, ans=0.125 2023-11-28 05:48:22,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3380273.3333333335, ans=0.07 2023-11-28 05:48:30,598 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.86 vs. limit=22.5 2023-11-28 05:48:31,201 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 507050 2023-11-28 05:48:34,891 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 2050, loss[loss=0.07967, simple_loss=0.1093, pruned_loss=0.01647, audio_tagging_loss=0.008551, over 15035.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.089, pruned_loss=0.01226, audio_tagging_loss=0.008714, over 3041582.61 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:48:57,840 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.02 vs. limit=15.0 2023-11-28 05:49:21,720 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.18 vs. limit=22.5 2023-11-28 05:49:29,189 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 507100 2023-11-28 05:49:30,216 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.425e+01 9.113e+01 9.631e+01 1.014e+02 1.250e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-28 05:49:32,373 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 2100, loss[loss=0.06294, simple_loss=0.08096, pruned_loss=0.01328, audio_tagging_loss=0.009183, over 15229.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.08965, pruned_loss=0.01232, audio_tagging_loss=0.008609, over 3038237.61 frames. ], batch size: 55, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:49:48,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3380740.0, ans=0.0 2023-11-28 05:50:06,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3380873.3333333335, ans=0.0 2023-11-28 05:50:14,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3380873.3333333335, ans=0.125 2023-11-28 05:50:15,444 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 05:50:22,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3380940.0, ans=0.125 2023-11-28 05:50:26,443 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 507150 2023-11-28 05:50:29,589 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 2150, loss[loss=0.08447, simple_loss=0.1184, pruned_loss=0.01926, audio_tagging_loss=0.006005, over 15960.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.08939, pruned_loss=0.01214, audio_tagging_loss=0.008622, over 3040066.47 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:50:36,407 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.76 vs. limit=15.0 2023-11-28 05:50:37,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3381006.6666666665, ans=0.0 2023-11-28 05:50:55,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3381140.0, ans=0.125 2023-11-28 05:50:57,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3381140.0, ans=0.0 2023-11-28 05:51:01,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3381140.0, ans=0.125 2023-11-28 05:51:07,336 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 05:51:19,388 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.07 vs. limit=15.0 2023-11-28 05:51:25,103 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 507200 2023-11-28 05:51:26,071 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.293e+01 8.657e+01 9.306e+01 1.016e+02 1.700e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-28 05:51:28,713 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 2200, loss[loss=0.06769, simple_loss=0.08601, pruned_loss=0.01311, audio_tagging_loss=0.01158, over 14460.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08918, pruned_loss=0.01211, audio_tagging_loss=0.008675, over 3042097.83 frames. ], batch size: 55, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:51:39,952 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3381406.6666666665, ans=0.125 2023-11-28 05:51:40,875 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3381406.6666666665, ans=0.2 2023-11-28 05:51:55,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3381473.3333333335, ans=0.0 2023-11-28 05:51:59,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3381473.3333333335, ans=0.07 2023-11-28 05:52:08,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3381540.0, ans=0.125 2023-11-28 05:52:23,796 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 507250 2023-11-28 05:52:27,088 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 2250, loss[loss=0.06723, simple_loss=0.09224, pruned_loss=0.01244, audio_tagging_loss=0.008669, over 14346.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.08927, pruned_loss=0.01207, audio_tagging_loss=0.008706, over 3043972.94 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:52:41,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3381740.0, ans=0.125 2023-11-28 05:52:50,031 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.86 vs. limit=15.0 2023-11-28 05:52:50,148 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.30 vs. limit=15.0 2023-11-28 05:52:50,255 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.10 vs. limit=15.0 2023-11-28 05:53:21,031 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 507300 2023-11-28 05:53:22,035 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.780e+01 8.876e+01 9.357e+01 9.943e+01 1.279e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-28 05:53:24,249 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 2300, loss[loss=0.07372, simple_loss=0.1016, pruned_loss=0.01596, audio_tagging_loss=0.006962, over 15372.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.08882, pruned_loss=0.01211, audio_tagging_loss=0.008673, over 3038698.72 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:53:53,017 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3382140.0, ans=0.125 2023-11-28 05:53:53,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3382140.0, ans=0.125 2023-11-28 05:54:17,413 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 05:54:18,561 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 507350 2023-11-28 05:54:21,706 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 2350, loss[loss=0.06204, simple_loss=0.07836, pruned_loss=0.01435, audio_tagging_loss=0.008508, over 14942.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08885, pruned_loss=0.0122, audio_tagging_loss=0.008739, over 3043908.26 frames. ], batch size: 54, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:54:23,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3382340.0, ans=0.1 2023-11-28 05:54:36,012 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3382406.6666666665, ans=0.1 2023-11-28 05:54:40,760 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.32 vs. limit=15.0 2023-11-28 05:54:51,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3382473.3333333335, ans=0.05 2023-11-28 05:55:07,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3382606.6666666665, ans=0.2 2023-11-28 05:55:17,751 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 507400 2023-11-28 05:55:18,818 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.197e+01 8.819e+01 9.502e+01 1.018e+02 1.349e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-28 05:55:21,473 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 2400, loss[loss=0.08107, simple_loss=0.112, pruned_loss=0.01673, audio_tagging_loss=0.008338, over 15533.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.08938, pruned_loss=0.01244, audio_tagging_loss=0.008868, over 3040478.81 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:56:02,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3382873.3333333335, ans=0.0 2023-11-28 05:56:11,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3382940.0, ans=0.0 2023-11-28 05:56:11,950 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.32 vs. limit=10.0 2023-11-28 05:56:14,618 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 507450 2023-11-28 05:56:14,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3382940.0, ans=0.0 2023-11-28 05:56:17,866 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 2450, loss[loss=0.08138, simple_loss=0.1155, pruned_loss=0.01635, audio_tagging_loss=0.007263, over 15026.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.08954, pruned_loss=0.01244, audio_tagging_loss=0.008936, over 3044509.67 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:56:58,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3383206.6666666665, ans=0.1 2023-11-28 05:57:09,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3383273.3333333335, ans=0.0 2023-11-28 05:57:12,408 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 507500 2023-11-28 05:57:13,375 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.364e+01 8.778e+01 9.508e+01 1.025e+02 1.201e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-28 05:57:15,557 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 2500, loss[loss=0.0759, simple_loss=0.1116, pruned_loss=0.01299, audio_tagging_loss=0.007107, over 14863.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.08868, pruned_loss=0.01234, audio_tagging_loss=0.009034, over 3042090.84 frames. ], batch size: 53, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:57:22,495 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3383340.0, ans=0.1 2023-11-28 05:58:00,520 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 05:58:04,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3383606.6666666665, ans=0.1 2023-11-28 05:58:04,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3383606.6666666665, ans=0.125 2023-11-28 05:58:10,290 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 507550 2023-11-28 05:58:14,144 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 2550, loss[loss=0.0682, simple_loss=0.09922, pruned_loss=0.01202, audio_tagging_loss=0.006565, over 15411.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.0891, pruned_loss=0.01227, audio_tagging_loss=0.008914, over 3044506.11 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:58:30,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3383740.0, ans=0.0 2023-11-28 05:58:42,749 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3383806.6666666665, ans=0.0 2023-11-28 05:58:46,481 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.80 vs. limit=15.0 2023-11-28 05:58:58,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3383873.3333333335, ans=0.125 2023-11-28 05:58:59,959 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.56 vs. limit=12.0 2023-11-28 05:59:08,248 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 507600 2023-11-28 05:59:09,231 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.936e+01 8.563e+01 9.261e+01 9.725e+01 1.208e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-28 05:59:09,453 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3383940.0, ans=0.125 2023-11-28 05:59:10,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3384006.6666666665, ans=0.125 2023-11-28 05:59:11,677 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 2600, loss[loss=0.05155, simple_loss=0.05785, pruned_loss=0.01149, audio_tagging_loss=0.01113, over 15594.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.08847, pruned_loss=0.01215, audio_tagging_loss=0.008865, over 3042865.39 frames. ], batch size: 60, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:59:20,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3384006.6666666665, ans=0.2 2023-11-28 05:59:47,084 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3384206.6666666665, ans=0.125 2023-11-28 05:59:50,429 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3384206.6666666665, ans=0.125 2023-11-28 06:00:06,184 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 507650 2023-11-28 06:00:09,387 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 2650, loss[loss=0.06115, simple_loss=0.0839, pruned_loss=0.01026, audio_tagging_loss=0.008943, over 14648.00 frames. ], tot_loss[loss=0.06498, simple_loss=0.08812, pruned_loss=0.01211, audio_tagging_loss=0.008813, over 3042152.95 frames. ], batch size: 55, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 06:00:19,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3384406.6666666665, ans=0.0 2023-11-28 06:00:24,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3384406.6666666665, ans=0.125 2023-11-28 06:00:35,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3384473.3333333335, ans=0.125 2023-11-28 06:01:03,459 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 507700 2023-11-28 06:01:03,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3384606.6666666665, ans=0.04949747468305833 2023-11-28 06:01:04,923 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.331e+01 8.714e+01 9.424e+01 1.027e+02 1.447e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-28 06:01:07,163 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 2700, loss[loss=0.05416, simple_loss=0.06637, pruned_loss=0.01063, audio_tagging_loss=0.01035, over 14286.00 frames. ], tot_loss[loss=0.06448, simple_loss=0.08738, pruned_loss=0.01195, audio_tagging_loss=0.00884, over 3038009.94 frames. ], batch size: 55, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 06:01:19,386 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.74 vs. limit=15.0 2023-11-28 06:01:22,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3384740.0, ans=0.125 2023-11-28 06:01:24,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3384740.0, ans=0.125 2023-11-28 06:01:26,043 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.17 vs. limit=15.0 2023-11-28 06:01:47,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3384873.3333333335, ans=0.2 2023-11-28 06:01:47,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3384873.3333333335, ans=0.0 2023-11-28 06:01:54,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3384940.0, ans=0.125 2023-11-28 06:01:57,828 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3384940.0, ans=0.95 2023-11-28 06:02:01,555 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 507750 2023-11-28 06:02:02,285 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.19 vs. limit=15.0 2023-11-28 06:02:04,878 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 2750, loss[loss=0.05932, simple_loss=0.07346, pruned_loss=0.01296, audio_tagging_loss=0.009636, over 14863.00 frames. ], tot_loss[loss=0.06468, simple_loss=0.08769, pruned_loss=0.01209, audio_tagging_loss=0.008744, over 3046606.05 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 06:02:06,291 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3385006.6666666665, ans=0.0 2023-11-28 06:02:11,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3385006.6666666665, ans=0.1 2023-11-28 06:02:14,993 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3385073.3333333335, ans=0.1 2023-11-28 06:02:33,333 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.39 vs. limit=15.0 2023-11-28 06:02:35,572 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.60 vs. limit=15.0 2023-11-28 06:02:40,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3385206.6666666665, ans=0.125 2023-11-28 06:02:44,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3385206.6666666665, ans=0.5 2023-11-28 06:02:45,742 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3385206.6666666665, ans=0.2 2023-11-28 06:02:57,433 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 06:02:58,577 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 507800 2023-11-28 06:02:59,613 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.045e+01 8.812e+01 9.371e+01 1.002e+02 1.155e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-28 06:03:02,361 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 2800, loss[loss=0.08393, simple_loss=0.1161, pruned_loss=0.01923, audio_tagging_loss=0.006657, over 15298.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.08919, pruned_loss=0.01218, audio_tagging_loss=0.008661, over 3050130.59 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 06:03:11,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3385340.0, ans=0.125 2023-11-28 06:03:23,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3385406.6666666665, ans=0.125 2023-11-28 06:03:26,611 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 06:03:28,170 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.95 vs. limit=12.0 2023-11-28 06:03:40,902 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.77 vs. limit=6.0 2023-11-28 06:03:56,791 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 507850 2023-11-28 06:04:00,437 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 2850, loss[loss=0.05134, simple_loss=0.06487, pruned_loss=0.00602, audio_tagging_loss=0.01289, over 15431.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.08908, pruned_loss=0.01224, audio_tagging_loss=0.008733, over 3052995.76 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 06:04:06,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=3385673.3333333335, ans=0.025 2023-11-28 06:04:12,726 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3385740.0, ans=0.07 2023-11-28 06:04:25,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3385806.6666666665, ans=0.125 2023-11-28 06:04:26,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3385806.6666666665, ans=0.125 2023-11-28 06:04:40,917 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.35 vs. limit=15.0 2023-11-28 06:04:50,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3385940.0, ans=0.1 2023-11-28 06:04:54,049 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 507900 2023-11-28 06:04:55,045 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.752e+01 8.844e+01 9.452e+01 9.978e+01 1.410e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-28 06:04:57,245 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 2900, loss[loss=0.07203, simple_loss=0.1031, pruned_loss=0.01421, audio_tagging_loss=0.006267, over 14778.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.09023, pruned_loss=0.01246, audio_tagging_loss=0.008649, over 3051113.62 frames. ], batch size: 54, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 06:05:00,186 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3386006.6666666665, ans=0.1 2023-11-28 06:05:08,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3386073.3333333335, ans=0.125 2023-11-28 06:05:08,085 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3386073.3333333335, ans=0.1 2023-11-28 06:05:26,809 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.00 vs. limit=12.0 2023-11-28 06:05:30,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3386140.0, ans=0.0 2023-11-28 06:05:31,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3386206.6666666665, ans=0.125 2023-11-28 06:05:51,674 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 507950 2023-11-28 06:05:54,936 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 2950, loss[loss=0.08259, simple_loss=0.1071, pruned_loss=0.02325, audio_tagging_loss=0.005784, over 14299.00 frames. ], tot_loss[loss=0.06708, simple_loss=0.09162, pruned_loss=0.0127, audio_tagging_loss=0.008572, over 3050896.01 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 06:05:57,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3386340.0, ans=0.0 2023-11-28 06:06:09,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3386406.6666666665, ans=0.125 2023-11-28 06:06:20,958 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3386473.3333333335, ans=0.0 2023-11-28 06:06:21,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3386473.3333333335, ans=0.0 2023-11-28 06:06:22,121 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3386473.3333333335, ans=0.2 2023-11-28 06:06:24,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3386473.3333333335, ans=0.125 2023-11-28 06:06:45,427 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.80 vs. limit=22.5 2023-11-28 06:06:49,068 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 508000 2023-11-28 06:06:50,047 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.145e+01 8.945e+01 9.706e+01 1.033e+02 1.300e+02, threshold=1.941e+02, percent-clipped=0.0 2023-11-28 06:06:55,282 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 3000, loss[loss=0.06531, simple_loss=0.08811, pruned_loss=0.01202, audio_tagging_loss=0.009234, over 15378.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.09069, pruned_loss=0.01255, audio_tagging_loss=0.008691, over 3046955.93 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 06:06:55,282 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-28 06:07:25,752 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.8422, 5.8229, 5.9253, 5.8848], device='cuda:1') 2023-11-28 06:07:29,846 INFO [train_asr.py:1267] (1/4) Epoch 43, validation: loss=0.0576, simple_loss=0.05056, pruned_loss=0.005189, audio_tagging_loss=0.02713, over 4681554.00 frames. 2023-11-28 06:07:29,847 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-28 06:07:38,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3386673.3333333335, ans=0.1 2023-11-28 06:07:53,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3386806.6666666665, ans=0.2 2023-11-28 06:08:00,827 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3386806.6666666665, ans=0.1 2023-11-28 06:08:11,222 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.40 vs. limit=6.0 2023-11-28 06:08:24,958 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 508050 2023-11-28 06:08:28,281 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 3050, loss[loss=0.06343, simple_loss=0.0834, pruned_loss=0.01343, audio_tagging_loss=0.008294, over 14379.00 frames. ], tot_loss[loss=0.0667, simple_loss=0.09094, pruned_loss=0.01249, audio_tagging_loss=0.008739, over 3040869.58 frames. ], batch size: 55, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 06:08:39,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3387073.3333333335, ans=0.0 2023-11-28 06:09:05,153 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 06:09:22,227 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 508100 2023-11-28 06:09:23,659 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.522e+01 8.950e+01 9.666e+01 1.022e+02 1.393e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-28 06:09:26,339 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 3100, loss[loss=0.06469, simple_loss=0.09206, pruned_loss=0.01074, audio_tagging_loss=0.007915, over 15145.00 frames. ], tot_loss[loss=0.0664, simple_loss=0.0905, pruned_loss=0.01239, audio_tagging_loss=0.008761, over 3042461.03 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 06:09:36,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3387406.6666666665, ans=0.1 2023-11-28 06:10:19,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3387606.6666666665, ans=0.125 2023-11-28 06:10:20,485 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 508150 2023-11-28 06:10:21,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3387606.6666666665, ans=0.0 2023-11-28 06:10:23,659 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 3150, loss[loss=0.06258, simple_loss=0.08814, pruned_loss=0.01108, audio_tagging_loss=0.007428, over 16386.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.09036, pruned_loss=0.01236, audio_tagging_loss=0.008783, over 3043094.09 frames. ], batch size: 60, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 06:10:24,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3387673.3333333335, ans=0.1 2023-11-28 06:10:25,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3387673.3333333335, ans=0.2 2023-11-28 06:10:35,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3387740.0, ans=0.0 2023-11-28 06:10:35,984 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.06 vs. limit=12.0 2023-11-28 06:10:55,728 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.12 vs. limit=15.0 2023-11-28 06:11:04,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=3387873.3333333335, ans=0.95 2023-11-28 06:11:06,629 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3387873.3333333335, ans=0.125 2023-11-28 06:11:14,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3387940.0, ans=0.125 2023-11-28 06:11:14,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3387940.0, ans=0.2 2023-11-28 06:11:17,510 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 508200 2023-11-28 06:11:18,542 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.421e+01 8.906e+01 9.448e+01 1.016e+02 1.228e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-28 06:11:22,233 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 3200, loss[loss=0.07312, simple_loss=0.1053, pruned_loss=0.01156, audio_tagging_loss=0.008914, over 15200.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.09011, pruned_loss=0.01229, audio_tagging_loss=0.008813, over 3046077.37 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 06:11:24,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3388006.6666666665, ans=0.0 2023-11-28 06:11:39,217 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.00 vs. limit=10.0 2023-11-28 06:11:40,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3388073.3333333335, ans=0.125 2023-11-28 06:11:41,252 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.73 vs. limit=15.0 2023-11-28 06:11:46,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3388140.0, ans=0.0 2023-11-28 06:11:55,078 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3388206.6666666665, ans=0.1 2023-11-28 06:12:13,459 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.23 vs. limit=6.0 2023-11-28 06:12:15,242 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 508250 2023-11-28 06:12:18,445 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 3250, loss[loss=0.05094, simple_loss=0.07081, pruned_loss=0.005285, audio_tagging_loss=0.01025, over 15176.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.08953, pruned_loss=0.01207, audio_tagging_loss=0.008889, over 3053691.08 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 06:12:25,531 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.53 vs. limit=15.0 2023-11-28 06:12:28,359 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=3388340.0, ans=0.05 2023-11-28 06:12:30,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3388406.6666666665, ans=0.0 2023-11-28 06:12:32,986 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.12 vs. limit=15.0 2023-11-28 06:12:37,460 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.26 vs. limit=15.0 2023-11-28 06:12:42,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=3388473.3333333335, ans=0.025 2023-11-28 06:12:58,007 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3388540.0, ans=0.0 2023-11-28 06:12:59,188 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 06:13:05,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3388606.6666666665, ans=0.2 2023-11-28 06:13:12,376 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=3388606.6666666665, ans=15.0 2023-11-28 06:13:13,080 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 508300 2023-11-28 06:13:15,105 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.764e+01 8.835e+01 9.450e+01 1.028e+02 1.248e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-28 06:13:16,201 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 3300, loss[loss=0.085, simple_loss=0.1221, pruned_loss=0.0148, audio_tagging_loss=0.009178, over 16562.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.09015, pruned_loss=0.01229, audio_tagging_loss=0.008997, over 3047659.54 frames. ], batch size: 62, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 06:13:41,077 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.66 vs. limit=22.5 2023-11-28 06:13:41,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3388806.6666666665, ans=0.125 2023-11-28 06:13:44,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3388806.6666666665, ans=0.0 2023-11-28 06:13:54,656 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3388873.3333333335, ans=0.1 2023-11-28 06:14:07,898 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3388940.0, ans=0.1 2023-11-28 06:14:09,950 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 508350 2023-11-28 06:14:13,198 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 3350, loss[loss=0.06366, simple_loss=0.09065, pruned_loss=0.009309, audio_tagging_loss=0.009031, over 15531.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.08992, pruned_loss=0.0122, audio_tagging_loss=0.008959, over 3049845.50 frames. ], batch size: 59, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 06:14:30,849 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.18 vs. limit=22.5 2023-11-28 06:14:33,771 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.48 vs. limit=12.0 2023-11-28 06:14:38,225 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.65 vs. limit=12.0 2023-11-28 06:15:03,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3389273.3333333335, ans=0.09899494936611666 2023-11-28 06:15:08,396 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 508400 2023-11-28 06:15:10,831 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.101e+01 8.815e+01 9.381e+01 1.020e+02 1.211e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-28 06:15:11,910 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 3400, loss[loss=0.04924, simple_loss=0.05526, pruned_loss=0.01168, audio_tagging_loss=0.009924, over 13714.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.09061, pruned_loss=0.01253, audio_tagging_loss=0.008752, over 3048121.49 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:15:20,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3389340.0, ans=0.125 2023-11-28 06:15:26,237 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3389406.6666666665, ans=0.2 2023-11-28 06:15:26,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3389406.6666666665, ans=0.0 2023-11-28 06:16:06,788 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 508450 2023-11-28 06:16:09,517 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.77 vs. limit=22.5 2023-11-28 06:16:09,959 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 3450, loss[loss=0.08592, simple_loss=0.1283, pruned_loss=0.01537, audio_tagging_loss=0.00639, over 14638.00 frames. ], tot_loss[loss=0.06693, simple_loss=0.09157, pruned_loss=0.01256, audio_tagging_loss=0.008584, over 3050433.88 frames. ], batch size: 53, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 06:16:34,415 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.19 vs. limit=5.0 2023-11-28 06:16:51,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3389873.3333333335, ans=0.07 2023-11-28 06:16:59,512 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 06:17:03,744 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 508500 2023-11-28 06:17:06,981 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.535e+01 8.735e+01 9.510e+01 1.030e+02 1.229e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-28 06:17:07,007 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 3500, loss[loss=0.07488, simple_loss=0.1049, pruned_loss=0.01603, audio_tagging_loss=0.006399, over 15312.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.09047, pruned_loss=0.01244, audio_tagging_loss=0.008509, over 3048868.62 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 06:17:09,853 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.29 vs. limit=10.0 2023-11-28 06:17:14,459 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3390006.6666666665, ans=0.0 2023-11-28 06:17:16,738 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3390006.6666666665, ans=0.125 2023-11-28 06:17:18,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3390073.3333333335, ans=0.125 2023-11-28 06:17:40,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3390140.0, ans=0.0 2023-11-28 06:17:40,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3390140.0, ans=0.2 2023-11-28 06:17:41,211 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 06:17:42,469 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3390206.6666666665, ans=0.125 2023-11-28 06:17:50,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3390206.6666666665, ans=0.1 2023-11-28 06:17:57,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3390273.3333333335, ans=0.125 2023-11-28 06:18:00,815 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3390273.3333333335, ans=0.0 2023-11-28 06:18:01,686 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 508550 2023-11-28 06:18:02,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3390273.3333333335, ans=0.2 2023-11-28 06:18:04,914 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 3550, loss[loss=0.07119, simple_loss=0.1027, pruned_loss=0.01316, audio_tagging_loss=0.006682, over 14287.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08925, pruned_loss=0.01235, audio_tagging_loss=0.008541, over 3043719.04 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 06:18:23,458 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.28 vs. limit=15.0 2023-11-28 06:18:32,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3390473.3333333335, ans=0.125 2023-11-28 06:19:00,592 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 508600 2023-11-28 06:19:04,058 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.301e+01 8.807e+01 9.210e+01 1.006e+02 1.301e+02, threshold=1.842e+02, percent-clipped=0.0 2023-11-28 06:19:04,085 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 3600, loss[loss=0.06607, simple_loss=0.09696, pruned_loss=0.009032, audio_tagging_loss=0.008556, over 14321.00 frames. ], tot_loss[loss=0.06484, simple_loss=0.0884, pruned_loss=0.01215, audio_tagging_loss=0.008495, over 3042592.95 frames. ], batch size: 53, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:19:11,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3390673.3333333335, ans=0.125 2023-11-28 06:19:21,178 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.93 vs. limit=15.0 2023-11-28 06:19:43,010 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.04 vs. limit=6.0 2023-11-28 06:19:57,703 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 508650 2023-11-28 06:20:00,994 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 3650, loss[loss=0.05701, simple_loss=0.0805, pruned_loss=0.008606, audio_tagging_loss=0.008157, over 15357.00 frames. ], tot_loss[loss=0.06443, simple_loss=0.08779, pruned_loss=0.01202, audio_tagging_loss=0.008517, over 3037634.18 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:20:04,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3391006.6666666665, ans=0.125 2023-11-28 06:20:12,860 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3391073.3333333335, ans=0.2 2023-11-28 06:20:23,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3391140.0, ans=0.125 2023-11-28 06:20:28,201 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3391140.0, ans=0.07 2023-11-28 06:20:31,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3391140.0, ans=0.0 2023-11-28 06:20:40,240 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3391206.6666666665, ans=0.125 2023-11-28 06:20:48,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3391273.3333333335, ans=0.07 2023-11-28 06:20:54,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3391273.3333333335, ans=0.2 2023-11-28 06:20:55,008 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 508700 2023-11-28 06:20:58,169 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.227e+01 8.782e+01 9.556e+01 1.009e+02 1.270e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-28 06:20:58,195 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 3700, loss[loss=0.07494, simple_loss=0.09969, pruned_loss=0.01567, audio_tagging_loss=0.009426, over 15589.00 frames. ], tot_loss[loss=0.06493, simple_loss=0.08854, pruned_loss=0.01217, audio_tagging_loss=0.008489, over 3038122.68 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:21:01,940 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.71 vs. limit=6.0 2023-11-28 06:21:15,608 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.72 vs. limit=22.5 2023-11-28 06:21:17,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3391406.6666666665, ans=0.0 2023-11-28 06:21:23,012 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3391473.3333333335, ans=0.125 2023-11-28 06:21:27,476 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3391473.3333333335, ans=0.1 2023-11-28 06:21:35,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3391540.0, ans=0.0 2023-11-28 06:21:38,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3391540.0, ans=0.1 2023-11-28 06:21:53,609 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 508750 2023-11-28 06:21:56,810 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 3750, loss[loss=0.06237, simple_loss=0.08317, pruned_loss=0.01135, audio_tagging_loss=0.00944, over 14813.00 frames. ], tot_loss[loss=0.06489, simple_loss=0.08847, pruned_loss=0.01212, audio_tagging_loss=0.008533, over 3043536.71 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:22:19,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3391806.6666666665, ans=0.125 2023-11-28 06:22:22,004 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.07 vs. limit=15.0 2023-11-28 06:22:26,901 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3391806.6666666665, ans=0.0 2023-11-28 06:22:33,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3391873.3333333335, ans=0.0 2023-11-28 06:22:36,809 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3391873.3333333335, ans=0.1 2023-11-28 06:22:40,361 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 06:22:47,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3391940.0, ans=0.0 2023-11-28 06:22:50,288 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 508800 2023-11-28 06:22:53,889 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 3800, loss[loss=0.06004, simple_loss=0.07426, pruned_loss=0.01215, audio_tagging_loss=0.01076, over 14521.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08914, pruned_loss=0.01245, audio_tagging_loss=0.008572, over 3040313.85 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 06:22:54,974 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.276e+01 8.959e+01 9.739e+01 1.027e+02 1.673e+02, threshold=1.948e+02, percent-clipped=0.0 2023-11-28 06:22:57,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3392006.6666666665, ans=0.125 2023-11-28 06:23:32,930 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.68 vs. limit=22.5 2023-11-28 06:23:40,354 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3392273.3333333335, ans=0.0 2023-11-28 06:23:48,352 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 508850 2023-11-28 06:23:51,668 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 3850, loss[loss=0.06387, simple_loss=0.08549, pruned_loss=0.01198, audio_tagging_loss=0.00915, over 16000.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.08899, pruned_loss=0.01227, audio_tagging_loss=0.008702, over 3041759.42 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 06:23:57,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3392340.0, ans=0.125 2023-11-28 06:24:07,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=3392406.6666666665, ans=10.0 2023-11-28 06:24:14,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3392473.3333333335, ans=0.2 2023-11-28 06:24:17,832 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3392473.3333333335, ans=0.1 2023-11-28 06:24:23,688 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=3392473.3333333335, ans=15.0 2023-11-28 06:24:46,348 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 508900 2023-11-28 06:24:50,033 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 3900, loss[loss=0.06523, simple_loss=0.07733, pruned_loss=0.01413, audio_tagging_loss=0.01244, over 16022.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08876, pruned_loss=0.01217, audio_tagging_loss=0.008771, over 3047518.08 frames. ], batch size: 61, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 06:24:51,109 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.497e+01 8.618e+01 9.454e+01 1.005e+02 2.661e+02, threshold=1.891e+02, percent-clipped=1.0 2023-11-28 06:24:55,814 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=3392673.3333333335, ans=15.0 2023-11-28 06:25:01,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3392740.0, ans=0.035 2023-11-28 06:25:03,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3392740.0, ans=0.0 2023-11-28 06:25:05,649 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.85 vs. limit=15.0 2023-11-28 06:25:43,126 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.15 vs. limit=15.0 2023-11-28 06:25:44,683 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 508950 2023-11-28 06:25:46,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3392940.0, ans=0.0 2023-11-28 06:25:48,022 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 3950, loss[loss=0.05416, simple_loss=0.0674, pruned_loss=0.009603, audio_tagging_loss=0.01086, over 14232.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08856, pruned_loss=0.01207, audio_tagging_loss=0.008833, over 3042390.02 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 06:25:48,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3393006.6666666665, ans=0.0 2023-11-28 06:26:22,010 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3393206.6666666665, ans=0.0 2023-11-28 06:26:32,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3393206.6666666665, ans=0.125 2023-11-28 06:26:42,379 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 509000 2023-11-28 06:26:46,285 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 4000, loss[loss=0.06416, simple_loss=0.08154, pruned_loss=0.0156, audio_tagging_loss=0.007789, over 15358.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.08913, pruned_loss=0.01216, audio_tagging_loss=0.008927, over 3034030.44 frames. ], batch size: 60, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:26:47,327 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.538e+01 8.879e+01 9.661e+01 1.044e+02 1.748e+02, threshold=1.932e+02, percent-clipped=0.0 2023-11-28 06:26:48,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3393340.0, ans=0.1 2023-11-28 06:26:50,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3393340.0, ans=0.0 2023-11-28 06:27:09,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3393473.3333333335, ans=0.0 2023-11-28 06:27:14,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3393473.3333333335, ans=0.125 2023-11-28 06:27:21,560 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.77 vs. limit=22.5 2023-11-28 06:27:32,849 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.47 vs. limit=15.0 2023-11-28 06:27:40,006 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 509050 2023-11-28 06:27:41,554 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.57 vs. limit=6.0 2023-11-28 06:27:44,185 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 4050, loss[loss=0.07995, simple_loss=0.1124, pruned_loss=0.01615, audio_tagging_loss=0.007597, over 15531.00 frames. ], tot_loss[loss=0.066, simple_loss=0.08965, pruned_loss=0.01225, audio_tagging_loss=0.008923, over 3033550.73 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 06:27:50,678 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 06:27:51,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3393673.3333333335, ans=0.035 2023-11-28 06:28:04,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3393740.0, ans=0.1 2023-11-28 06:28:19,386 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3393873.3333333335, ans=0.07 2023-11-28 06:28:22,543 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3393873.3333333335, ans=0.95 2023-11-28 06:28:23,072 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.28 vs. limit=12.0 2023-11-28 06:28:28,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3393873.3333333335, ans=0.125 2023-11-28 06:28:32,659 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.71 vs. limit=15.0 2023-11-28 06:28:37,616 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 509100 2023-11-28 06:28:40,843 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 4100, loss[loss=0.07805, simple_loss=0.1076, pruned_loss=0.01803, audio_tagging_loss=0.006199, over 14016.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.0906, pruned_loss=0.0123, audio_tagging_loss=0.00887, over 3035752.77 frames. ], batch size: 52, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 06:28:43,584 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.882e+01 9.020e+01 9.493e+01 1.014e+02 1.731e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-28 06:29:06,353 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3394140.0, ans=0.1 2023-11-28 06:29:20,117 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.57 vs. limit=22.5 2023-11-28 06:29:34,376 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3394273.3333333335, ans=0.125 2023-11-28 06:29:34,882 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=14.20 vs. limit=15.0 2023-11-28 06:29:35,295 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 509150 2023-11-28 06:29:38,587 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 4150, loss[loss=0.0583, simple_loss=0.07709, pruned_loss=0.01387, audio_tagging_loss=0.00589, over 14581.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.0909, pruned_loss=0.01236, audio_tagging_loss=0.008775, over 3039808.68 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 06:29:51,555 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3394406.6666666665, ans=0.125 2023-11-28 06:29:58,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3394406.6666666665, ans=0.1 2023-11-28 06:29:59,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3394406.6666666665, ans=0.1 2023-11-28 06:30:09,345 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.44 vs. limit=15.0 2023-11-28 06:30:10,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3394473.3333333335, ans=0.0 2023-11-28 06:30:25,694 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 06:30:26,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3394606.6666666665, ans=0.125 2023-11-28 06:30:33,296 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 509200 2023-11-28 06:30:36,810 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 4200, loss[loss=0.05871, simple_loss=0.07921, pruned_loss=0.01069, audio_tagging_loss=0.008418, over 14408.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.09024, pruned_loss=0.01232, audio_tagging_loss=0.008572, over 3041002.91 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 06:30:39,979 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.529e+01 8.720e+01 9.332e+01 1.017e+02 1.296e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-28 06:30:44,013 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.90 vs. limit=15.0 2023-11-28 06:30:46,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3394673.3333333335, ans=0.125 2023-11-28 06:30:52,683 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3394740.0, ans=0.125 2023-11-28 06:30:53,529 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3394740.0, ans=0.0 2023-11-28 06:31:09,700 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.63 vs. limit=15.0 2023-11-28 06:31:17,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3394873.3333333335, ans=0.0 2023-11-28 06:31:18,598 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.66 vs. limit=15.0 2023-11-28 06:31:31,507 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.33 vs. limit=15.0 2023-11-28 06:31:32,133 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 509250 2023-11-28 06:31:33,977 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.54 vs. limit=12.0 2023-11-28 06:31:35,363 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 4250, loss[loss=0.07012, simple_loss=0.09898, pruned_loss=0.01531, audio_tagging_loss=0.005317, over 16642.00 frames. ], tot_loss[loss=0.0667, simple_loss=0.09147, pruned_loss=0.01253, audio_tagging_loss=0.008439, over 3048091.10 frames. ], batch size: 61, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 06:31:41,414 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.01 vs. limit=22.5 2023-11-28 06:31:47,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3395073.3333333335, ans=0.125 2023-11-28 06:31:58,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3395140.0, ans=0.0 2023-11-28 06:32:25,513 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.57 vs. limit=22.5 2023-11-28 06:32:29,187 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 509300 2023-11-28 06:32:33,201 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 4300, loss[loss=0.07429, simple_loss=0.1038, pruned_loss=0.01585, audio_tagging_loss=0.006561, over 14722.00 frames. ], tot_loss[loss=0.0667, simple_loss=0.09117, pruned_loss=0.01265, audio_tagging_loss=0.008466, over 3046128.03 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 06:32:35,395 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.011e+01 8.843e+01 9.507e+01 1.023e+02 2.128e+02, threshold=1.901e+02, percent-clipped=1.0 2023-11-28 06:32:37,546 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.78 vs. limit=10.0 2023-11-28 06:32:40,173 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.24 vs. limit=15.0 2023-11-28 06:32:46,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3395406.6666666665, ans=0.1 2023-11-28 06:32:49,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3395406.6666666665, ans=0.125 2023-11-28 06:32:51,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3395406.6666666665, ans=0.125 2023-11-28 06:32:52,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3395406.6666666665, ans=0.125 2023-11-28 06:33:25,274 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.14 vs. limit=22.5 2023-11-28 06:33:27,124 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3395606.6666666665, ans=0.0 2023-11-28 06:33:27,973 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 509350 2023-11-28 06:33:31,220 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 4350, loss[loss=0.09404, simple_loss=0.1387, pruned_loss=0.01849, audio_tagging_loss=0.006216, over 15856.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.09109, pruned_loss=0.01259, audio_tagging_loss=0.008449, over 3047749.57 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 06:33:53,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3395806.6666666665, ans=0.0 2023-11-28 06:34:06,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3395873.3333333335, ans=0.0 2023-11-28 06:34:11,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3395873.3333333335, ans=0.125 2023-11-28 06:34:12,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=3395873.3333333335, ans=10.0 2023-11-28 06:34:22,009 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 06:34:24,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3395940.0, ans=0.125 2023-11-28 06:34:26,170 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 509400 2023-11-28 06:34:29,647 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 4400, loss[loss=0.0574, simple_loss=0.07527, pruned_loss=0.01064, audio_tagging_loss=0.009122, over 16419.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.09028, pruned_loss=0.0126, audio_tagging_loss=0.008437, over 3047507.83 frames. ], batch size: 62, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:34:31,057 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3396006.6666666665, ans=0.125 2023-11-28 06:34:31,826 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.603e+01 8.958e+01 9.354e+01 1.037e+02 1.325e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-28 06:34:48,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3396073.3333333335, ans=0.125 2023-11-28 06:35:06,246 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3396206.6666666665, ans=0.125 2023-11-28 06:35:13,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3396206.6666666665, ans=0.125 2023-11-28 06:35:15,302 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.96 vs. limit=10.0 2023-11-28 06:35:17,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3396273.3333333335, ans=0.0 2023-11-28 06:35:23,651 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 509450 2023-11-28 06:35:23,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3396273.3333333335, ans=0.125 2023-11-28 06:35:26,830 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 4450, loss[loss=0.0649, simple_loss=0.08951, pruned_loss=0.01012, audio_tagging_loss=0.01003, over 15508.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.09042, pruned_loss=0.01263, audio_tagging_loss=0.008518, over 3049736.36 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:35:34,436 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3396340.0, ans=0.125 2023-11-28 06:35:43,600 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.44 vs. limit=12.0 2023-11-28 06:35:49,158 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3396473.3333333335, ans=0.0 2023-11-28 06:36:04,366 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.14 vs. limit=15.0 2023-11-28 06:36:21,658 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 509500 2023-11-28 06:36:24,893 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 4500, loss[loss=0.05601, simple_loss=0.08168, pruned_loss=0.007485, audio_tagging_loss=0.007681, over 15300.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.09051, pruned_loss=0.01252, audio_tagging_loss=0.0084, over 3046177.62 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:36:27,125 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.667e+01 8.759e+01 9.220e+01 9.806e+01 1.292e+02, threshold=1.844e+02, percent-clipped=0.0 2023-11-28 06:36:31,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3396673.3333333335, ans=0.125 2023-11-28 06:36:32,394 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3396673.3333333335, ans=0.1 2023-11-28 06:36:40,387 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3396740.0, ans=0.1 2023-11-28 06:36:48,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3396806.6666666665, ans=0.0 2023-11-28 06:36:52,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3396806.6666666665, ans=0.1 2023-11-28 06:37:05,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3396873.3333333335, ans=0.125 2023-11-28 06:37:19,908 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 509550 2023-11-28 06:37:22,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3397006.6666666665, ans=0.125 2023-11-28 06:37:23,190 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 4550, loss[loss=0.08491, simple_loss=0.1158, pruned_loss=0.02094, audio_tagging_loss=0.006081, over 14965.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08969, pruned_loss=0.01235, audio_tagging_loss=0.008497, over 3041943.03 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:37:34,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3397073.3333333335, ans=0.125 2023-11-28 06:37:47,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3397140.0, ans=0.0 2023-11-28 06:38:11,287 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 06:38:16,805 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 509600 2023-11-28 06:38:17,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3397273.3333333335, ans=0.2 2023-11-28 06:38:20,306 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 4600, loss[loss=0.05678, simple_loss=0.07234, pruned_loss=0.01113, audio_tagging_loss=0.009484, over 13967.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08925, pruned_loss=0.01223, audio_tagging_loss=0.008613, over 3038051.90 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:38:20,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3397340.0, ans=0.1 2023-11-28 06:38:22,444 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.061e+01 8.724e+01 9.423e+01 1.019e+02 1.398e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-28 06:38:32,066 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3397406.6666666665, ans=0.0 2023-11-28 06:38:35,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3397406.6666666665, ans=0.1 2023-11-28 06:39:02,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3397540.0, ans=0.125 2023-11-28 06:39:14,825 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 509650 2023-11-28 06:39:17,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3397673.3333333335, ans=0.1 2023-11-28 06:39:18,115 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 4650, loss[loss=0.04629, simple_loss=0.06829, pruned_loss=0.005621, audio_tagging_loss=0.006519, over 15859.00 frames. ], tot_loss[loss=0.06502, simple_loss=0.08862, pruned_loss=0.01197, audio_tagging_loss=0.008743, over 3044546.31 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:39:37,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3397740.0, ans=0.1 2023-11-28 06:39:40,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3397806.6666666665, ans=0.125 2023-11-28 06:40:02,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3397873.3333333335, ans=0.015 2023-11-28 06:40:07,790 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.80 vs. limit=15.0 2023-11-28 06:40:13,671 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 509700 2023-11-28 06:40:16,813 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 4700, loss[loss=0.06568, simple_loss=0.08687, pruned_loss=0.01167, audio_tagging_loss=0.01057, over 16003.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.08944, pruned_loss=0.01215, audio_tagging_loss=0.008812, over 3046513.51 frames. ], batch size: 62, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:40:18,958 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.949e+01 8.857e+01 9.480e+01 1.024e+02 1.425e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-28 06:40:26,341 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.35 vs. limit=22.5 2023-11-28 06:41:10,398 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 509750 2023-11-28 06:41:13,608 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 4750, loss[loss=0.08499, simple_loss=0.1193, pruned_loss=0.01785, audio_tagging_loss=0.007518, over 14944.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.08999, pruned_loss=0.01221, audio_tagging_loss=0.008908, over 3048624.21 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:41:32,387 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.57 vs. limit=12.0 2023-11-28 06:41:38,788 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3398473.3333333335, ans=0.2 2023-11-28 06:42:06,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3398606.6666666665, ans=0.1 2023-11-28 06:42:07,319 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 509800 2023-11-28 06:42:11,430 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 4800, loss[loss=0.07834, simple_loss=0.1131, pruned_loss=0.0153, audio_tagging_loss=0.006477, over 15326.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.09, pruned_loss=0.01229, audio_tagging_loss=0.008911, over 3048714.66 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 06:42:13,641 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.517e+01 8.791e+01 9.387e+01 1.001e+02 1.346e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-28 06:42:18,271 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3398673.3333333335, ans=0.125 2023-11-28 06:42:26,837 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.69 vs. limit=15.0 2023-11-28 06:42:29,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3398740.0, ans=0.1 2023-11-28 06:42:58,937 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3398940.0, ans=0.1 2023-11-28 06:43:04,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3398940.0, ans=0.1 2023-11-28 06:43:05,731 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 509850 2023-11-28 06:43:09,514 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 4850, loss[loss=0.08544, simple_loss=0.1236, pruned_loss=0.01733, audio_tagging_loss=0.006316, over 15713.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.09108, pruned_loss=0.0124, audio_tagging_loss=0.00893, over 3052138.62 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 06:43:16,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3399006.6666666665, ans=0.0 2023-11-28 06:43:24,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3399073.3333333335, ans=0.125 2023-11-28 06:43:29,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3399073.3333333335, ans=0.1 2023-11-28 06:43:45,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3399206.6666666665, ans=0.1 2023-11-28 06:43:50,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3399206.6666666665, ans=0.125 2023-11-28 06:44:02,651 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 509900 2023-11-28 06:44:05,799 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 4900, loss[loss=0.05275, simple_loss=0.06887, pruned_loss=0.009974, audio_tagging_loss=0.008341, over 15593.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.09062, pruned_loss=0.01225, audio_tagging_loss=0.008937, over 3051508.11 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 06:44:07,977 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.069e+01 8.789e+01 9.268e+01 1.027e+02 1.406e+02, threshold=1.854e+02, percent-clipped=0.0 2023-11-28 06:44:08,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3399340.0, ans=0.125 2023-11-28 06:44:26,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3399406.6666666665, ans=0.125 2023-11-28 06:44:36,548 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.43 vs. limit=10.0 2023-11-28 06:44:37,818 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 06:44:45,759 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.19 vs. limit=15.0 2023-11-28 06:44:55,920 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.78 vs. limit=22.5 2023-11-28 06:44:58,656 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3399606.6666666665, ans=0.0 2023-11-28 06:44:59,552 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 509950 2023-11-28 06:45:03,475 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 4950, loss[loss=0.05294, simple_loss=0.065, pruned_loss=0.00841, audio_tagging_loss=0.01203, over 14486.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.09068, pruned_loss=0.01228, audio_tagging_loss=0.008732, over 3050801.39 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:45:18,958 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3399740.0, ans=0.1 2023-11-28 06:45:25,382 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-28 06:45:28,787 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3399806.6666666665, ans=0.0 2023-11-28 06:45:36,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3399806.6666666665, ans=0.2 2023-11-28 06:45:38,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3399873.3333333335, ans=0.5 2023-11-28 06:45:46,215 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3399873.3333333335, ans=0.125 2023-11-28 06:45:46,423 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3399873.3333333335, ans=0.0 2023-11-28 06:45:57,788 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 510000 2023-11-28 06:46:01,570 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 5000, loss[loss=0.08688, simple_loss=0.1267, pruned_loss=0.01677, audio_tagging_loss=0.006754, over 15582.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.0907, pruned_loss=0.01228, audio_tagging_loss=0.008538, over 3047897.14 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:46:04,376 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3400006.6666666665, ans=0.2 2023-11-28 06:46:05,257 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.678e+01 8.682e+01 9.362e+01 1.003e+02 1.327e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-28 06:46:10,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3400006.6666666665, ans=0.0 2023-11-28 06:46:23,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3400140.0, ans=0.0 2023-11-28 06:46:37,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3400206.6666666665, ans=0.0 2023-11-28 06:46:56,499 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 510050 2023-11-28 06:46:59,740 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 5050, loss[loss=0.06603, simple_loss=0.08831, pruned_loss=0.01245, audio_tagging_loss=0.009425, over 15264.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.09067, pruned_loss=0.01234, audio_tagging_loss=0.008493, over 3047961.54 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:47:49,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3400606.6666666665, ans=0.1 2023-11-28 06:47:51,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3400606.6666666665, ans=0.125 2023-11-28 06:47:52,598 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.13 vs. limit=22.5 2023-11-28 06:47:53,301 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 510100 2023-11-28 06:47:56,494 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 5100, loss[loss=0.03102, simple_loss=0.03368, pruned_loss=0.004797, audio_tagging_loss=0.009377, over 15469.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.09025, pruned_loss=0.0123, audio_tagging_loss=0.008484, over 3052859.38 frames. ], batch size: 61, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:48:00,394 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.602e+01 8.708e+01 9.390e+01 1.019e+02 1.146e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-28 06:48:06,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3400673.3333333335, ans=0.125 2023-11-28 06:48:08,234 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3400740.0, ans=0.125 2023-11-28 06:48:27,764 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.64 vs. limit=22.5 2023-11-28 06:48:33,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3400873.3333333335, ans=0.0 2023-11-28 06:48:35,097 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3400873.3333333335, ans=0.1 2023-11-28 06:48:51,008 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 510150 2023-11-28 06:48:54,194 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 5150, loss[loss=0.05547, simple_loss=0.07382, pruned_loss=0.00941, audio_tagging_loss=0.009146, over 14463.00 frames. ], tot_loss[loss=0.06475, simple_loss=0.08858, pruned_loss=0.012, audio_tagging_loss=0.008455, over 3051329.50 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:49:00,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3401006.6666666665, ans=0.025 2023-11-28 06:49:08,784 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.85 vs. limit=15.0 2023-11-28 06:49:11,354 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3401073.3333333335, ans=0.125 2023-11-28 06:49:16,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3401140.0, ans=0.125 2023-11-28 06:49:18,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3401140.0, ans=0.125 2023-11-28 06:49:31,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3401206.6666666665, ans=0.0 2023-11-28 06:49:40,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3401273.3333333335, ans=0.125 2023-11-28 06:49:49,098 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 510200 2023-11-28 06:49:51,872 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3401340.0, ans=0.125 2023-11-28 06:49:52,350 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.45 vs. limit=15.0 2023-11-28 06:49:52,689 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 5200, loss[loss=0.04675, simple_loss=0.06259, pruned_loss=0.004954, audio_tagging_loss=0.0105, over 15294.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08943, pruned_loss=0.01214, audio_tagging_loss=0.008465, over 3041447.08 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 06:49:56,610 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.435e+01 8.684e+01 9.283e+01 1.002e+02 1.274e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-28 06:50:02,290 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3401340.0, ans=0.0 2023-11-28 06:50:09,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3401406.6666666665, ans=0.015 2023-11-28 06:50:20,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3401473.3333333335, ans=0.09899494936611666 2023-11-28 06:50:22,548 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3401473.3333333335, ans=0.125 2023-11-28 06:50:40,733 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 06:50:47,141 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 510250 2023-11-28 06:50:50,402 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 5250, loss[loss=0.07354, simple_loss=0.09639, pruned_loss=0.01587, audio_tagging_loss=0.009467, over 16120.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.09012, pruned_loss=0.01221, audio_tagging_loss=0.008432, over 3041845.13 frames. ], batch size: 60, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 06:50:56,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3401673.3333333335, ans=0.1 2023-11-28 06:51:26,548 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3401873.3333333335, ans=0.0 2023-11-28 06:51:29,906 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 06:51:44,528 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 510300 2023-11-28 06:51:47,665 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 5300, loss[loss=0.07283, simple_loss=0.1022, pruned_loss=0.01238, audio_tagging_loss=0.009332, over 15069.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.09028, pruned_loss=0.01229, audio_tagging_loss=0.00839, over 3039648.56 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 06:51:50,950 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.521e+01 8.883e+01 9.472e+01 1.016e+02 1.198e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-28 06:52:04,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3402073.3333333335, ans=0.07 2023-11-28 06:52:09,592 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3402140.0, ans=0.0 2023-11-28 06:52:11,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3402140.0, ans=0.0 2023-11-28 06:52:12,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3402140.0, ans=0.1 2023-11-28 06:52:13,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3402140.0, ans=0.125 2023-11-28 06:52:25,506 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.66 vs. limit=15.0 2023-11-28 06:52:30,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3402206.6666666665, ans=0.125 2023-11-28 06:52:31,864 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3402206.6666666665, ans=0.0 2023-11-28 06:52:34,313 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.46 vs. limit=15.0 2023-11-28 06:52:42,473 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 510350 2023-11-28 06:52:45,618 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 5350, loss[loss=0.06784, simple_loss=0.08786, pruned_loss=0.01306, audio_tagging_loss=0.01085, over 14884.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.09101, pruned_loss=0.01245, audio_tagging_loss=0.008364, over 3044008.51 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:52:51,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3402340.0, ans=0.0 2023-11-28 06:52:52,351 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 06:53:17,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3402473.3333333335, ans=0.2 2023-11-28 06:53:39,070 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 510400 2023-11-28 06:53:42,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3402673.3333333335, ans=0.125 2023-11-28 06:53:43,132 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 5400, loss[loss=0.07216, simple_loss=0.1001, pruned_loss=0.01592, audio_tagging_loss=0.006182, over 15354.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.09087, pruned_loss=0.01241, audio_tagging_loss=0.008471, over 3043704.42 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:53:47,400 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.415e+01 8.616e+01 9.187e+01 1.017e+02 1.243e+02, threshold=1.837e+02, percent-clipped=0.0 2023-11-28 06:53:54,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3402740.0, ans=0.0 2023-11-28 06:54:00,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3402740.0, ans=0.5 2023-11-28 06:54:16,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3402873.3333333335, ans=0.125 2023-11-28 06:54:18,789 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3402873.3333333335, ans=0.1 2023-11-28 06:54:20,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3402873.3333333335, ans=0.0 2023-11-28 06:54:37,167 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 510450 2023-11-28 06:54:40,382 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 5450, loss[loss=0.06532, simple_loss=0.09654, pruned_loss=0.01014, audio_tagging_loss=0.006905, over 15610.00 frames. ], tot_loss[loss=0.06648, simple_loss=0.09075, pruned_loss=0.01255, audio_tagging_loss=0.008557, over 3034792.20 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:54:43,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3403006.6666666665, ans=0.125 2023-11-28 06:54:50,387 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3403006.6666666665, ans=0.09899494936611666 2023-11-28 06:54:53,001 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.79 vs. limit=22.5 2023-11-28 06:55:24,215 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3403206.6666666665, ans=0.04949747468305833 2023-11-28 06:55:34,808 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 510500 2023-11-28 06:55:38,009 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 5500, loss[loss=0.07518, simple_loss=0.105, pruned_loss=0.01423, audio_tagging_loss=0.008473, over 15333.00 frames. ], tot_loss[loss=0.06672, simple_loss=0.09101, pruned_loss=0.01265, audio_tagging_loss=0.008566, over 3041103.49 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:55:42,347 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.114e+01 8.928e+01 9.472e+01 1.024e+02 1.464e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-28 06:55:45,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3403340.0, ans=0.125 2023-11-28 06:55:59,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3403473.3333333335, ans=0.0 2023-11-28 06:56:05,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3403473.3333333335, ans=0.125 2023-11-28 06:56:12,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3403540.0, ans=0.125 2023-11-28 06:56:17,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3403540.0, ans=0.125 2023-11-28 06:56:21,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3403540.0, ans=0.95 2023-11-28 06:56:31,803 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 510550 2023-11-28 06:56:34,972 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 5550, loss[loss=0.07408, simple_loss=0.09761, pruned_loss=0.01478, audio_tagging_loss=0.01049, over 17013.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.09077, pruned_loss=0.01261, audio_tagging_loss=0.008654, over 3047340.24 frames. ], batch size: 63, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:56:38,964 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.69 vs. limit=10.0 2023-11-28 06:56:45,161 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3403673.3333333335, ans=0.2 2023-11-28 06:56:45,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3403673.3333333335, ans=0.0 2023-11-28 06:57:07,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3403806.6666666665, ans=0.125 2023-11-28 06:57:14,761 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.65 vs. limit=15.0 2023-11-28 06:57:21,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3403940.0, ans=0.125 2023-11-28 06:57:21,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3403940.0, ans=0.2 2023-11-28 06:57:27,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3403940.0, ans=0.0 2023-11-28 06:57:29,867 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 510600 2023-11-28 06:57:32,546 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3404006.6666666665, ans=0.2 2023-11-28 06:57:33,361 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 5600, loss[loss=0.06163, simple_loss=0.07885, pruned_loss=0.01295, audio_tagging_loss=0.009249, over 16318.00 frames. ], tot_loss[loss=0.06697, simple_loss=0.09121, pruned_loss=0.01267, audio_tagging_loss=0.008703, over 3053626.46 frames. ], batch size: 60, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 06:57:33,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3404006.6666666665, ans=0.125 2023-11-28 06:57:33,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3404006.6666666665, ans=0.0 2023-11-28 06:57:34,592 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3404006.6666666665, ans=0.2 2023-11-28 06:57:36,194 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.17 vs. limit=22.5 2023-11-28 06:57:37,610 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.389e+01 9.053e+01 9.702e+01 1.068e+02 1.336e+02, threshold=1.940e+02, percent-clipped=0.0 2023-11-28 06:57:45,841 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.93 vs. limit=6.0 2023-11-28 06:57:46,594 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3404073.3333333335, ans=10.0 2023-11-28 06:57:49,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3404073.3333333335, ans=0.125 2023-11-28 06:58:17,464 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 06:58:18,230 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.48 vs. limit=22.5 2023-11-28 06:58:27,130 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 510650 2023-11-28 06:58:30,295 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 5650, loss[loss=0.0545, simple_loss=0.0679, pruned_loss=0.007695, audio_tagging_loss=0.01286, over 14478.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.08998, pruned_loss=0.01237, audio_tagging_loss=0.008736, over 3049923.03 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:58:34,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3404340.0, ans=0.1 2023-11-28 06:58:36,206 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.76 vs. limit=22.5 2023-11-28 06:58:45,200 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.70 vs. limit=22.5 2023-11-28 06:58:49,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3404406.6666666665, ans=0.1 2023-11-28 06:58:49,534 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.98 vs. limit=10.0 2023-11-28 06:58:52,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3404473.3333333335, ans=0.125 2023-11-28 06:59:06,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3404540.0, ans=0.04949747468305833 2023-11-28 06:59:07,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3404540.0, ans=0.1 2023-11-28 06:59:24,116 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 510700 2023-11-28 06:59:27,445 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 5700, loss[loss=0.04944, simple_loss=0.0662, pruned_loss=0.007606, audio_tagging_loss=0.008729, over 14272.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.09032, pruned_loss=0.0125, audio_tagging_loss=0.00871, over 3045543.56 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:59:32,041 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3404673.3333333335, ans=0.125 2023-11-28 06:59:32,866 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.542e+01 8.734e+01 9.325e+01 1.007e+02 1.153e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-28 06:59:34,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3404673.3333333335, ans=0.125 2023-11-28 06:59:38,234 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3404740.0, ans=0.95 2023-11-28 06:59:52,277 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.77 vs. limit=22.5 2023-11-28 06:59:56,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3404806.6666666665, ans=0.125 2023-11-28 07:00:04,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3404873.3333333335, ans=0.125 2023-11-28 07:00:09,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3404873.3333333335, ans=0.07 2023-11-28 07:00:10,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3404873.3333333335, ans=0.035 2023-11-28 07:00:18,304 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.98 vs. limit=15.0 2023-11-28 07:00:18,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3404940.0, ans=0.125 2023-11-28 07:00:21,534 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 510750 2023-11-28 07:00:24,728 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 5750, loss[loss=0.06133, simple_loss=0.08381, pruned_loss=0.009454, audio_tagging_loss=0.009969, over 15225.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.08979, pruned_loss=0.01237, audio_tagging_loss=0.008633, over 3048105.89 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 07:00:54,775 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3405140.0, ans=0.125 2023-11-28 07:01:18,588 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 510800 2023-11-28 07:01:22,751 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 5800, loss[loss=0.04855, simple_loss=0.06295, pruned_loss=0.008037, audio_tagging_loss=0.009041, over 14806.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.09012, pruned_loss=0.01252, audio_tagging_loss=0.00861, over 3043113.28 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 07:01:24,495 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.40 vs. limit=15.0 2023-11-28 07:01:28,132 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.310e+01 8.777e+01 9.348e+01 1.032e+02 1.624e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-28 07:01:40,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3405406.6666666665, ans=0.2 2023-11-28 07:01:40,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3405406.6666666665, ans=0.0 2023-11-28 07:01:50,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3405473.3333333335, ans=0.125 2023-11-28 07:02:04,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3405540.0, ans=0.1 2023-11-28 07:02:15,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3405606.6666666665, ans=0.125 2023-11-28 07:02:16,399 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 510850 2023-11-28 07:02:19,678 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 5850, loss[loss=0.0764, simple_loss=0.1023, pruned_loss=0.01862, audio_tagging_loss=0.00662, over 14383.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.08969, pruned_loss=0.01246, audio_tagging_loss=0.00862, over 3038142.47 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 07:02:21,344 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.51 vs. limit=15.0 2023-11-28 07:02:59,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3405873.3333333335, ans=0.1 2023-11-28 07:03:13,189 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 510900 2023-11-28 07:03:16,957 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 5900, loss[loss=0.05792, simple_loss=0.08265, pruned_loss=0.009298, audio_tagging_loss=0.007302, over 15746.00 frames. ], tot_loss[loss=0.06644, simple_loss=0.09068, pruned_loss=0.01254, audio_tagging_loss=0.008565, over 3039913.73 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 07:03:22,379 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.089e+01 8.808e+01 9.419e+01 9.961e+01 1.259e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-28 07:03:35,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3406073.3333333335, ans=0.0 2023-11-28 07:03:38,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3406073.3333333335, ans=0.125 2023-11-28 07:03:41,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3406140.0, ans=0.2 2023-11-28 07:03:46,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3406140.0, ans=0.125 2023-11-28 07:04:11,268 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 510950 2023-11-28 07:04:13,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3406340.0, ans=0.0 2023-11-28 07:04:14,903 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 5950, loss[loss=0.05604, simple_loss=0.07735, pruned_loss=0.007105, audio_tagging_loss=0.01026, over 15086.00 frames. ], tot_loss[loss=0.06679, simple_loss=0.09132, pruned_loss=0.01253, audio_tagging_loss=0.008606, over 3044646.77 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 07:04:23,609 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.25 vs. limit=15.0 2023-11-28 07:04:27,929 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.06 vs. limit=22.5 2023-11-28 07:04:31,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3406406.6666666665, ans=0.0 2023-11-28 07:05:00,359 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3406606.6666666665, ans=0.0 2023-11-28 07:05:08,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3406606.6666666665, ans=0.125 2023-11-28 07:05:09,084 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 511000 2023-11-28 07:05:12,616 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 6000, loss[loss=0.06274, simple_loss=0.07866, pruned_loss=0.0141, audio_tagging_loss=0.009306, over 15638.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.09016, pruned_loss=0.01246, audio_tagging_loss=0.008677, over 3040870.93 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 07:05:12,616 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-28 07:05:25,319 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.0629, 2.3120, 3.6937, 3.6451, 3.4810, 3.6733, 3.4529, 3.6638], device='cuda:1') 2023-11-28 07:05:47,609 INFO [train_asr.py:1267] (1/4) Epoch 43, validation: loss=0.0577, simple_loss=0.05058, pruned_loss=0.005244, audio_tagging_loss=0.02717, over 4681554.00 frames. 2023-11-28 07:05:47,610 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-28 07:05:47,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3406673.3333333335, ans=0.1 2023-11-28 07:05:53,016 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.416e+01 8.750e+01 9.275e+01 1.001e+02 1.273e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-28 07:06:07,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3406740.0, ans=0.1 2023-11-28 07:06:17,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3406806.6666666665, ans=0.1 2023-11-28 07:06:24,296 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3406873.3333333335, ans=0.1 2023-11-28 07:06:30,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3406873.3333333335, ans=0.125 2023-11-28 07:06:32,438 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 07:06:33,010 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.86 vs. limit=15.0 2023-11-28 07:06:37,045 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3406940.0, ans=0.1 2023-11-28 07:06:41,938 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 511050 2023-11-28 07:06:43,677 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 07:06:45,688 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 6050, loss[loss=0.05452, simple_loss=0.07494, pruned_loss=0.009715, audio_tagging_loss=0.00734, over 15240.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.09049, pruned_loss=0.01259, audio_tagging_loss=0.00859, over 3045624.57 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 07:07:03,505 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3407073.3333333335, ans=0.125 2023-11-28 07:07:18,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3407206.6666666665, ans=0.0 2023-11-28 07:07:32,709 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3407273.3333333335, ans=0.1 2023-11-28 07:07:39,119 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 511100 2023-11-28 07:07:42,399 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 6100, loss[loss=0.06559, simple_loss=0.0901, pruned_loss=0.01134, audio_tagging_loss=0.009199, over 14250.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.08985, pruned_loss=0.01235, audio_tagging_loss=0.008667, over 3051199.24 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 07:07:47,822 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.529e+01 8.837e+01 9.364e+01 1.005e+02 1.238e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-28 07:07:59,775 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3407406.6666666665, ans=0.1 2023-11-28 07:08:12,258 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.25 vs. limit=6.0 2023-11-28 07:08:36,564 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 511150 2023-11-28 07:08:39,905 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 6150, loss[loss=0.05378, simple_loss=0.07868, pruned_loss=0.006166, audio_tagging_loss=0.008274, over 14796.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.09024, pruned_loss=0.01239, audio_tagging_loss=0.008715, over 3051632.24 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 07:08:43,266 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.26 vs. limit=15.0 2023-11-28 07:08:55,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3407740.0, ans=0.125 2023-11-28 07:08:56,436 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.07 vs. limit=10.0 2023-11-28 07:09:25,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3407940.0, ans=0.05 2023-11-28 07:09:33,558 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 511200 2023-11-28 07:09:37,600 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 6200, loss[loss=0.05849, simple_loss=0.08295, pruned_loss=0.007747, audio_tagging_loss=0.009267, over 16424.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.08977, pruned_loss=0.01225, audio_tagging_loss=0.008733, over 3058196.24 frames. ], batch size: 62, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 07:09:43,633 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.472e+01 8.632e+01 9.387e+01 1.018e+02 1.235e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-28 07:09:45,356 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.20 vs. limit=15.0 2023-11-28 07:09:51,759 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.92 vs. limit=15.0 2023-11-28 07:10:10,059 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.52 vs. limit=15.0 2023-11-28 07:10:14,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3408206.6666666665, ans=0.0 2023-11-28 07:10:14,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3408206.6666666665, ans=0.125 2023-11-28 07:10:15,793 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.33 vs. limit=15.0 2023-11-28 07:10:24,704 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3408273.3333333335, ans=0.0 2023-11-28 07:10:27,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3408273.3333333335, ans=0.125 2023-11-28 07:10:31,626 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 511250 2023-11-28 07:10:34,782 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 6250, loss[loss=0.07159, simple_loss=0.09221, pruned_loss=0.01595, audio_tagging_loss=0.009534, over 15830.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.08959, pruned_loss=0.01226, audio_tagging_loss=0.008796, over 3053900.24 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 07:10:40,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3408340.0, ans=0.0 2023-11-28 07:11:07,892 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.43 vs. limit=15.0 2023-11-28 07:11:13,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3408540.0, ans=0.0 2023-11-28 07:11:22,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3408606.6666666665, ans=0.125 2023-11-28 07:11:28,908 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 511300 2023-11-28 07:11:32,057 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 6300, loss[loss=0.06898, simple_loss=0.09664, pruned_loss=0.01114, audio_tagging_loss=0.009516, over 15273.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.08942, pruned_loss=0.01226, audio_tagging_loss=0.00887, over 3046944.76 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 07:11:38,160 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.358e+01 8.880e+01 9.504e+01 1.024e+02 1.327e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-28 07:11:40,695 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3408673.3333333335, ans=0.125 2023-11-28 07:11:58,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3408806.6666666665, ans=0.0 2023-11-28 07:12:26,529 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 511350 2023-11-28 07:12:29,723 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 6350, loss[loss=0.07045, simple_loss=0.1024, pruned_loss=0.01182, audio_tagging_loss=0.007417, over 14676.00 frames. ], tot_loss[loss=0.066, simple_loss=0.08963, pruned_loss=0.01228, audio_tagging_loss=0.008902, over 3047966.71 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 07:12:36,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3409006.6666666665, ans=0.125 2023-11-28 07:12:58,210 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.50 vs. limit=15.0 2023-11-28 07:13:24,557 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 511400 2023-11-28 07:13:28,601 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 6400, loss[loss=0.06901, simple_loss=0.09209, pruned_loss=0.01199, audio_tagging_loss=0.01098, over 15068.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.08996, pruned_loss=0.01237, audio_tagging_loss=0.008993, over 3044286.45 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 07:13:35,203 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.281e+01 8.831e+01 9.327e+01 9.903e+01 1.480e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-28 07:13:43,051 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3409406.6666666665, ans=0.125 2023-11-28 07:14:01,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3409540.0, ans=0.025 2023-11-28 07:14:04,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3409540.0, ans=0.1 2023-11-28 07:14:08,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3409540.0, ans=0.04949747468305833 2023-11-28 07:14:15,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3409606.6666666665, ans=0.0 2023-11-28 07:14:17,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3409606.6666666665, ans=10.0 2023-11-28 07:14:21,604 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 511450 2023-11-28 07:14:24,819 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 6450, loss[loss=0.06822, simple_loss=0.1004, pruned_loss=0.009005, audio_tagging_loss=0.009039, over 16716.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.09028, pruned_loss=0.01239, audio_tagging_loss=0.009008, over 3042300.54 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 07:14:32,706 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.08 vs. limit=15.0 2023-11-28 07:14:44,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3409740.0, ans=0.0 2023-11-28 07:15:18,599 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 511500 2023-11-28 07:15:21,729 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 6500, loss[loss=0.08476, simple_loss=0.1142, pruned_loss=0.02002, audio_tagging_loss=0.007642, over 16057.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.09032, pruned_loss=0.01233, audio_tagging_loss=0.008901, over 3053200.19 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 07:15:28,781 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.593e+01 8.791e+01 9.611e+01 1.014e+02 1.471e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-28 07:15:36,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3410073.3333333335, ans=0.015 2023-11-28 07:15:51,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3410140.0, ans=0.07 2023-11-28 07:15:56,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3410206.6666666665, ans=0.125 2023-11-28 07:16:08,598 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.64 vs. limit=15.0 2023-11-28 07:16:11,910 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3410273.3333333335, ans=0.1 2023-11-28 07:16:15,978 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 511550 2023-11-28 07:16:16,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3410273.3333333335, ans=0.125 2023-11-28 07:16:19,183 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 6550, loss[loss=0.06761, simple_loss=0.09159, pruned_loss=0.01526, audio_tagging_loss=0.006554, over 14241.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.09066, pruned_loss=0.0125, audio_tagging_loss=0.008732, over 3055555.56 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 07:16:19,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3410340.0, ans=0.0 2023-11-28 07:16:27,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3410340.0, ans=0.125 2023-11-28 07:16:31,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3410406.6666666665, ans=0.0 2023-11-28 07:16:42,819 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.46 vs. limit=15.0 2023-11-28 07:16:45,953 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3410473.3333333335, ans=0.0 2023-11-28 07:16:52,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3410540.0, ans=0.0 2023-11-28 07:16:58,286 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3410540.0, ans=0.125 2023-11-28 07:17:00,488 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3410540.0, ans=0.125 2023-11-28 07:17:01,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3410540.0, ans=0.07 2023-11-28 07:17:12,748 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 511600 2023-11-28 07:17:16,243 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 6600, loss[loss=0.06323, simple_loss=0.08572, pruned_loss=0.01265, audio_tagging_loss=0.007721, over 15117.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.09078, pruned_loss=0.01248, audio_tagging_loss=0.008606, over 3058197.34 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 07:17:18,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3410673.3333333335, ans=0.125 2023-11-28 07:17:19,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3410673.3333333335, ans=0.125 2023-11-28 07:17:20,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3410673.3333333335, ans=0.1 2023-11-28 07:17:24,287 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.209e+01 8.683e+01 9.479e+01 1.018e+02 1.462e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-28 07:17:24,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3410673.3333333335, ans=0.125 2023-11-28 07:17:30,021 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3410740.0, ans=0.5 2023-11-28 07:17:32,747 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.29 vs. limit=15.0 2023-11-28 07:17:36,601 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3410740.0, ans=0.0 2023-11-28 07:17:56,467 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3410873.3333333335, ans=0.0 2023-11-28 07:17:58,082 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.90 vs. limit=15.0 2023-11-28 07:18:07,789 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3410940.0, ans=0.0 2023-11-28 07:18:09,962 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 511650 2023-11-28 07:18:13,104 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 6650, loss[loss=0.06485, simple_loss=0.09079, pruned_loss=0.0109, audio_tagging_loss=0.008557, over 14862.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.09117, pruned_loss=0.01249, audio_tagging_loss=0.008582, over 3049145.71 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 07:18:15,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3411006.6666666665, ans=0.125 2023-11-28 07:18:24,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3411073.3333333335, ans=0.2 2023-11-28 07:18:24,641 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.97 vs. limit=12.0 2023-11-28 07:18:30,827 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3411073.3333333335, ans=0.0 2023-11-28 07:18:31,890 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3411073.3333333335, ans=0.0 2023-11-28 07:19:07,139 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 511700 2023-11-28 07:19:10,398 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 6700, loss[loss=0.0555, simple_loss=0.07149, pruned_loss=0.01144, audio_tagging_loss=0.008312, over 16968.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.09077, pruned_loss=0.01269, audio_tagging_loss=0.008514, over 3041317.62 frames. ], batch size: 70, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 07:19:13,832 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3411340.0, ans=0.0 2023-11-28 07:19:17,921 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.071e+01 9.075e+01 9.531e+01 1.012e+02 1.694e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-28 07:19:21,423 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3411406.6666666665, ans=0.125 2023-11-28 07:19:45,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3411540.0, ans=0.5 2023-11-28 07:19:55,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3411606.6666666665, ans=0.0 2023-11-28 07:20:01,900 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3411606.6666666665, ans=0.0 2023-11-28 07:20:03,892 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 511750 2023-11-28 07:20:07,123 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 6750, loss[loss=0.07519, simple_loss=0.09178, pruned_loss=0.01975, audio_tagging_loss=0.009546, over 14952.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.08956, pruned_loss=0.0124, audio_tagging_loss=0.008546, over 3035444.20 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 07:20:19,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3411740.0, ans=0.2 2023-11-28 07:20:24,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3411740.0, ans=0.2 2023-11-28 07:20:44,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3411873.3333333335, ans=0.125 2023-11-28 07:20:50,710 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3411873.3333333335, ans=0.125 2023-11-28 07:21:00,807 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 511800 2023-11-28 07:21:01,993 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3411940.0, ans=0.04949747468305833 2023-11-28 07:21:03,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3412006.6666666665, ans=0.125 2023-11-28 07:21:04,753 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 6800, loss[loss=0.056, simple_loss=0.07047, pruned_loss=0.01046, audio_tagging_loss=0.01031, over 16029.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08894, pruned_loss=0.01246, audio_tagging_loss=0.008589, over 3034984.92 frames. ], batch size: 62, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 07:21:05,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3412006.6666666665, ans=0.0 2023-11-28 07:21:12,432 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.573e+01 8.904e+01 9.309e+01 9.890e+01 1.281e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-28 07:21:29,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3412140.0, ans=0.0 2023-11-28 07:21:35,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3412140.0, ans=0.125 2023-11-28 07:21:58,887 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 511850 2023-11-28 07:22:02,598 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 6850, loss[loss=0.06903, simple_loss=0.09275, pruned_loss=0.01587, audio_tagging_loss=0.006792, over 14752.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.08937, pruned_loss=0.01232, audio_tagging_loss=0.008592, over 3036082.96 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 07:22:12,695 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3412406.6666666665, ans=0.2 2023-11-28 07:22:20,196 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3412406.6666666665, ans=0.125 2023-11-28 07:22:20,619 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.26 vs. limit=10.0 2023-11-28 07:22:26,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3412473.3333333335, ans=0.0 2023-11-28 07:22:56,297 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 511900 2023-11-28 07:22:56,670 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.89 vs. limit=22.5 2023-11-28 07:22:59,475 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 6900, loss[loss=0.05918, simple_loss=0.08437, pruned_loss=0.00974, audio_tagging_loss=0.007256, over 15184.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.09037, pruned_loss=0.01234, audio_tagging_loss=0.00856, over 3031027.02 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 07:23:04,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3412673.3333333335, ans=0.09899494936611666 2023-11-28 07:23:07,205 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.647e+01 8.771e+01 9.385e+01 1.023e+02 1.493e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-28 07:23:07,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3412673.3333333335, ans=0.125 2023-11-28 07:23:09,628 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3412740.0, ans=0.0 2023-11-28 07:23:23,432 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.81 vs. limit=15.0 2023-11-28 07:23:48,860 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 07:23:49,124 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3412940.0, ans=0.125 2023-11-28 07:23:53,311 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 511950 2023-11-28 07:23:57,051 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 6950, loss[loss=0.06227, simple_loss=0.08325, pruned_loss=0.01314, audio_tagging_loss=0.007504, over 15045.00 frames. ], tot_loss[loss=0.0663, simple_loss=0.09057, pruned_loss=0.01243, audio_tagging_loss=0.008587, over 3030109.01 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 07:24:03,438 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 07:24:03,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff3.min_abs, batch_count=3413006.6666666665, ans=0.2 2023-11-28 07:24:12,103 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.68 vs. limit=15.0 2023-11-28 07:24:14,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=3413073.3333333335, ans=15.0 2023-11-28 07:24:15,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3413073.3333333335, ans=0.0 2023-11-28 07:24:22,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3413140.0, ans=0.1 2023-11-28 07:24:30,562 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.47 vs. limit=10.0 2023-11-28 07:24:48,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3413273.3333333335, ans=0.125 2023-11-28 07:24:51,280 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 512000 2023-11-28 07:24:57,488 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 7000, loss[loss=0.06606, simple_loss=0.09201, pruned_loss=0.01148, audio_tagging_loss=0.008575, over 15243.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.08972, pruned_loss=0.01234, audio_tagging_loss=0.008655, over 3031332.36 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 07:25:06,157 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.426e+01 8.579e+01 9.421e+01 1.029e+02 1.258e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-28 07:25:10,816 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3413406.6666666665, ans=0.0 2023-11-28 07:25:19,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3413473.3333333335, ans=0.125 2023-11-28 07:25:25,462 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.22 vs. limit=15.0 2023-11-28 07:25:48,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3413606.6666666665, ans=0.0 2023-11-28 07:25:50,605 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 512050 2023-11-28 07:25:50,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3413606.6666666665, ans=0.125 2023-11-28 07:25:53,853 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 7050, loss[loss=0.07658, simple_loss=0.0982, pruned_loss=0.01891, audio_tagging_loss=0.008574, over 13889.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.08925, pruned_loss=0.01234, audio_tagging_loss=0.008693, over 3026385.49 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 07:26:07,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3413740.0, ans=0.125 2023-11-28 07:26:09,690 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.81 vs. limit=15.0 2023-11-28 07:26:20,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3413806.6666666665, ans=0.125 2023-11-28 07:26:40,640 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3413940.0, ans=0.0 2023-11-28 07:26:43,816 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3413940.0, ans=0.1 2023-11-28 07:26:46,970 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 512100 2023-11-28 07:26:50,148 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 7100, loss[loss=0.06591, simple_loss=0.09341, pruned_loss=0.01096, audio_tagging_loss=0.008244, over 15695.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.08885, pruned_loss=0.01231, audio_tagging_loss=0.00873, over 3033714.18 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 07:26:54,817 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.15 vs. limit=22.5 2023-11-28 07:26:55,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3414006.6666666665, ans=0.1 2023-11-28 07:27:01,321 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.269e+01 9.094e+01 9.538e+01 1.011e+02 1.389e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-28 07:27:07,704 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3414073.3333333335, ans=0.0 2023-11-28 07:27:16,618 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3414140.0, ans=0.0 2023-11-28 07:27:28,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3414206.6666666665, ans=0.07 2023-11-28 07:27:34,196 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.38 vs. limit=12.0 2023-11-28 07:27:44,902 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 512150 2023-11-28 07:27:48,106 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 7150, loss[loss=0.06084, simple_loss=0.07718, pruned_loss=0.01057, audio_tagging_loss=0.01168, over 13898.00 frames. ], tot_loss[loss=0.06595, simple_loss=0.08961, pruned_loss=0.01237, audio_tagging_loss=0.00877, over 3033081.47 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 4.0 2023-11-28 07:28:01,670 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3414406.6666666665, ans=0.0 2023-11-28 07:28:18,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3414473.3333333335, ans=0.2 2023-11-28 07:28:35,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3414606.6666666665, ans=0.125 2023-11-28 07:28:41,965 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 512200 2023-11-28 07:28:44,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3414673.3333333335, ans=0.125 2023-11-28 07:28:45,581 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 7200, loss[loss=0.07708, simple_loss=0.1082, pruned_loss=0.01641, audio_tagging_loss=0.006577, over 15760.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.08984, pruned_loss=0.01243, audio_tagging_loss=0.008828, over 3038040.76 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 07:28:51,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3414673.3333333335, ans=0.1 2023-11-28 07:28:54,422 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3414673.3333333335, ans=0.125 2023-11-28 07:28:56,394 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.923e+01 8.861e+01 9.668e+01 1.042e+02 2.032e+02, threshold=1.934e+02, percent-clipped=1.0 2023-11-28 07:29:15,652 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.35 vs. limit=22.5 2023-11-28 07:29:18,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3414806.6666666665, ans=0.0 2023-11-28 07:29:27,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3414873.3333333335, ans=0.1 2023-11-28 07:29:34,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3414940.0, ans=0.05 2023-11-28 07:29:34,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3414940.0, ans=0.125 2023-11-28 07:29:37,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3414940.0, ans=0.05 2023-11-28 07:29:38,933 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 512250 2023-11-28 07:29:42,138 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 7250, loss[loss=0.05449, simple_loss=0.07361, pruned_loss=0.00577, audio_tagging_loss=0.01191, over 14374.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.08934, pruned_loss=0.01227, audio_tagging_loss=0.008969, over 3038046.10 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 07:29:54,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3415073.3333333335, ans=0.125 2023-11-28 07:30:05,830 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3415140.0, ans=0.0 2023-11-28 07:30:23,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3415206.6666666665, ans=0.125 2023-11-28 07:30:30,209 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 07:30:34,189 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3415273.3333333335, ans=0.125 2023-11-28 07:30:36,119 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 512300 2023-11-28 07:30:39,342 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 7300, loss[loss=0.09312, simple_loss=0.1327, pruned_loss=0.02094, audio_tagging_loss=0.005812, over 16099.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08888, pruned_loss=0.01226, audio_tagging_loss=0.008943, over 3038587.38 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 07:30:48,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3415340.0, ans=0.0 2023-11-28 07:30:51,225 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.554e+01 8.861e+01 9.411e+01 1.033e+02 1.259e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-28 07:30:53,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3415406.6666666665, ans=0.2 2023-11-28 07:31:09,313 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.38 vs. limit=6.0 2023-11-28 07:31:12,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3415540.0, ans=0.125 2023-11-28 07:31:13,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3415540.0, ans=0.2 2023-11-28 07:31:33,718 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 512350 2023-11-28 07:31:36,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3415673.3333333335, ans=0.0 2023-11-28 07:31:37,008 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 7350, loss[loss=0.06808, simple_loss=0.1034, pruned_loss=0.009908, audio_tagging_loss=0.006466, over 15480.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.08972, pruned_loss=0.01231, audio_tagging_loss=0.008737, over 3042410.73 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 07:31:47,200 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3415740.0, ans=0.1 2023-11-28 07:31:50,193 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3415740.0, ans=0.125 2023-11-28 07:31:52,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3415740.0, ans=0.125 2023-11-28 07:32:05,476 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.74 vs. limit=15.0 2023-11-28 07:32:23,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3415940.0, ans=0.09899494936611666 2023-11-28 07:32:24,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3415940.0, ans=0.125 2023-11-28 07:32:25,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3415940.0, ans=0.2 2023-11-28 07:32:29,950 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 512400 2023-11-28 07:32:33,360 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 7400, loss[loss=0.06924, simple_loss=0.1025, pruned_loss=0.009439, audio_tagging_loss=0.008551, over 15645.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.08949, pruned_loss=0.01234, audio_tagging_loss=0.008673, over 3042066.30 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 07:32:40,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3416006.6666666665, ans=0.125 2023-11-28 07:32:43,029 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3416006.6666666665, ans=0.0 2023-11-28 07:32:44,951 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.500e+01 8.822e+01 9.327e+01 1.016e+02 1.231e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-28 07:32:46,204 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3416073.3333333335, ans=0.125 2023-11-28 07:32:50,629 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3416073.3333333335, ans=0.0 2023-11-28 07:33:27,699 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 512450 2023-11-28 07:33:30,878 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 7450, loss[loss=0.05681, simple_loss=0.08111, pruned_loss=0.009057, audio_tagging_loss=0.007198, over 15154.00 frames. ], tot_loss[loss=0.06538, simple_loss=0.08908, pruned_loss=0.01221, audio_tagging_loss=0.008633, over 3043959.53 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 07:33:54,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3416473.3333333335, ans=0.125 2023-11-28 07:34:11,394 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 07:34:13,442 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3416540.0, ans=0.0 2023-11-28 07:34:26,012 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 512500 2023-11-28 07:34:29,285 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 7500, loss[loss=0.06776, simple_loss=0.09455, pruned_loss=0.01208, audio_tagging_loss=0.008407, over 14984.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08942, pruned_loss=0.01219, audio_tagging_loss=0.008614, over 3044303.87 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 07:34:30,789 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3416673.3333333335, ans=0.2 2023-11-28 07:34:40,252 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.348e+01 8.775e+01 9.275e+01 9.988e+01 1.436e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-28 07:34:42,136 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.49 vs. limit=15.0 2023-11-28 07:34:45,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3416740.0, ans=0.125 2023-11-28 07:35:22,966 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 512550 2023-11-28 07:35:26,336 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 7550, loss[loss=0.05549, simple_loss=0.07014, pruned_loss=0.00787, audio_tagging_loss=0.01255, over 14907.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.08944, pruned_loss=0.01208, audio_tagging_loss=0.008617, over 3047308.89 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 07:35:33,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3417006.6666666665, ans=0.1 2023-11-28 07:36:02,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3417206.6666666665, ans=0.1 2023-11-28 07:36:04,368 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3417206.6666666665, ans=0.0 2023-11-28 07:36:04,433 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3417206.6666666665, ans=0.125 2023-11-28 07:36:07,593 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3417206.6666666665, ans=0.0 2023-11-28 07:36:20,883 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 512600 2023-11-28 07:36:25,123 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 7600, loss[loss=0.05288, simple_loss=0.06461, pruned_loss=0.01014, audio_tagging_loss=0.01044, over 14554.00 frames. ], tot_loss[loss=0.06507, simple_loss=0.08878, pruned_loss=0.01205, audio_tagging_loss=0.008634, over 3045265.67 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 07:36:29,752 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3417340.0, ans=0.125 2023-11-28 07:36:36,989 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.414e+01 8.736e+01 9.227e+01 9.964e+01 1.331e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-28 07:36:39,361 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3417406.6666666665, ans=0.125 2023-11-28 07:37:20,211 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 512650 2023-11-28 07:37:23,487 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 7650, loss[loss=0.07176, simple_loss=0.1076, pruned_loss=0.01117, audio_tagging_loss=0.006794, over 15058.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08899, pruned_loss=0.01203, audio_tagging_loss=0.008614, over 3048002.16 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 07:37:33,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3417673.3333333335, ans=0.0 2023-11-28 07:37:44,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3417740.0, ans=0.2 2023-11-28 07:38:18,880 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 512700 2023-11-28 07:38:20,278 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3417940.0, ans=0.07 2023-11-28 07:38:22,138 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 7700, loss[loss=0.07059, simple_loss=0.09699, pruned_loss=0.01352, audio_tagging_loss=0.008575, over 15281.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08981, pruned_loss=0.01215, audio_tagging_loss=0.00857, over 3044013.00 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 07:38:24,541 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3418006.6666666665, ans=0.1 2023-11-28 07:38:34,285 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.482e+01 8.901e+01 9.400e+01 1.006e+02 1.251e+02, threshold=1.880e+02, percent-clipped=0.0 2023-11-28 07:38:45,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3418140.0, ans=0.125 2023-11-28 07:39:17,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3418206.6666666665, ans=0.125 2023-11-28 07:39:52,014 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 512750 2023-11-28 07:40:09,432 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 7750, loss[loss=0.05558, simple_loss=0.0742, pruned_loss=0.01137, audio_tagging_loss=0.007106, over 15368.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.08965, pruned_loss=0.01232, audio_tagging_loss=0.008562, over 3052038.73 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 07:40:54,041 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3418406.6666666665, ans=0.2 2023-11-28 07:41:22,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3418406.6666666665, ans=0.5 2023-11-28 07:43:26,710 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.45 vs. limit=15.0 2023-11-28 07:43:32,601 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 512800 2023-11-28 07:43:46,041 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 7800, loss[loss=0.06796, simple_loss=0.0969, pruned_loss=0.01093, audio_tagging_loss=0.008582, over 15682.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.08942, pruned_loss=0.01218, audio_tagging_loss=0.008653, over 3052631.02 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 07:43:46,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3418673.3333333335, ans=10.0 2023-11-28 07:44:31,081 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.682e+01 8.859e+01 9.420e+01 1.032e+02 1.560e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-28 07:45:50,597 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.10 vs. limit=15.0 2023-11-28 07:46:23,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3418940.0, ans=0.125 2023-11-28 07:46:38,777 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3418940.0, ans=0.125 2023-11-28 07:46:49,980 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 512850 2023-11-28 07:47:05,901 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 7850, loss[loss=0.05389, simple_loss=0.07323, pruned_loss=0.008987, audio_tagging_loss=0.008285, over 15033.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08934, pruned_loss=0.01203, audio_tagging_loss=0.008671, over 3050159.14 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 07:47:48,735 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.95 vs. limit=15.0 2023-11-28 07:48:18,471 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.44 vs. limit=15.0 2023-11-28 07:48:21,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3419073.3333333335, ans=0.07 2023-11-28 07:49:34,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3419206.6666666665, ans=0.0 2023-11-28 07:50:32,234 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 512900 2023-11-28 07:50:44,921 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 7900, loss[loss=0.08427, simple_loss=0.1159, pruned_loss=0.01607, audio_tagging_loss=0.01023, over 15059.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.08977, pruned_loss=0.01215, audio_tagging_loss=0.008708, over 3053944.39 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 07:50:55,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3419340.0, ans=0.125 2023-11-28 07:51:24,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3419406.6666666665, ans=0.0 2023-11-28 07:51:24,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3419406.6666666665, ans=0.125 2023-11-28 07:51:25,719 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.969e+01 8.861e+01 9.655e+01 1.039e+02 1.530e+02, threshold=1.931e+02, percent-clipped=0.0 2023-11-28 07:51:38,003 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3419406.6666666665, ans=0.125 2023-11-28 07:51:39,047 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.25 vs. limit=15.0 2023-11-28 07:52:08,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3419473.3333333335, ans=0.07 2023-11-28 07:52:36,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3419540.0, ans=0.125 2023-11-28 07:52:40,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3419540.0, ans=0.2 2023-11-28 07:53:42,944 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 512950 2023-11-28 07:53:45,838 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3419606.6666666665, ans=0.1 2023-11-28 07:53:53,022 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 7950, loss[loss=0.06615, simple_loss=0.08452, pruned_loss=0.01713, audio_tagging_loss=0.006767, over 14735.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.08959, pruned_loss=0.0122, audio_tagging_loss=0.008818, over 3054494.24 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 07:54:51,228 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 07:54:58,406 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 07:55:43,105 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.80 vs. limit=15.0 2023-11-28 07:56:07,107 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3419873.3333333335, ans=0.125 2023-11-28 07:57:25,195 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 513000 2023-11-28 07:57:41,278 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 8000, loss[loss=0.06228, simple_loss=0.08429, pruned_loss=0.01244, audio_tagging_loss=0.007695, over 14911.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.08972, pruned_loss=0.01216, audio_tagging_loss=0.008875, over 3054523.20 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 07:58:16,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3420006.6666666665, ans=0.0 2023-11-28 07:58:27,809 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.238e+01 8.711e+01 9.409e+01 1.028e+02 1.220e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-28 07:58:50,234 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3420073.3333333335, ans=0.2 2023-11-28 07:58:58,520 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3420073.3333333335, ans=0.125 2023-11-28 07:59:09,656 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3420140.0, ans=0.1 2023-11-28 08:01:07,361 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 513050 2023-11-28 08:01:19,639 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 8050, loss[loss=0.04564, simple_loss=0.04821, pruned_loss=0.008741, audio_tagging_loss=0.0128, over 16546.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.08967, pruned_loss=0.01214, audio_tagging_loss=0.008878, over 3052975.82 frames. ], batch size: 68, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:01:20,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=3420340.0, ans=15.0 2023-11-28 08:01:23,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3420340.0, ans=0.95 2023-11-28 08:01:42,698 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 08:02:12,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3420406.6666666665, ans=0.125 2023-11-28 08:03:06,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3420540.0, ans=0.1 2023-11-28 08:03:29,204 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3420540.0, ans=0.125 2023-11-28 08:03:38,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3420606.6666666665, ans=0.125 2023-11-28 08:04:01,950 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 513100 2023-11-28 08:04:03,333 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.21 vs. limit=12.0 2023-11-28 08:04:12,027 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 8100, loss[loss=0.06317, simple_loss=0.08883, pruned_loss=0.01134, audio_tagging_loss=0.007417, over 15074.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.08965, pruned_loss=0.01221, audio_tagging_loss=0.008812, over 3050691.89 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:04:12,658 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3420673.3333333335, ans=0.0 2023-11-28 08:04:35,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3420673.3333333335, ans=0.1 2023-11-28 08:04:51,142 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.352e+01 8.964e+01 9.574e+01 1.024e+02 1.325e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-28 08:05:23,330 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 08:05:45,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3420806.6666666665, ans=0.125 2023-11-28 08:05:47,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3420806.6666666665, ans=0.125 2023-11-28 08:05:56,977 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.07 vs. limit=12.0 2023-11-28 08:06:30,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3420873.3333333335, ans=0.125 2023-11-28 08:06:52,813 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.82 vs. limit=12.0 2023-11-28 08:07:02,729 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 513150 2023-11-28 08:07:11,797 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 8150, loss[loss=0.059, simple_loss=0.07556, pruned_loss=0.01137, audio_tagging_loss=0.009853, over 15128.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.08991, pruned_loss=0.01236, audio_tagging_loss=0.008738, over 3052876.92 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:08:57,578 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3421206.6666666665, ans=0.125 2023-11-28 08:10:00,147 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 513200 2023-11-28 08:10:09,182 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 8200, loss[loss=0.07734, simple_loss=0.1017, pruned_loss=0.01682, audio_tagging_loss=0.009679, over 14911.00 frames. ], tot_loss[loss=0.06624, simple_loss=0.09018, pruned_loss=0.01249, audio_tagging_loss=0.008653, over 3053164.69 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:10:20,999 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 08:10:48,337 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.547e+01 8.671e+01 9.315e+01 1.033e+02 1.596e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-28 08:11:02,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3421406.6666666665, ans=0.2 2023-11-28 08:11:16,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3421473.3333333335, ans=0.125 2023-11-28 08:11:38,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3421473.3333333335, ans=0.125 2023-11-28 08:11:52,552 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.29 vs. limit=15.0 2023-11-28 08:12:30,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3421606.6666666665, ans=0.09899494936611666 2023-11-28 08:12:54,369 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 513250 2023-11-28 08:13:04,420 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 8250, loss[loss=0.07151, simple_loss=0.1002, pruned_loss=0.01349, audio_tagging_loss=0.007911, over 14947.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.09064, pruned_loss=0.01247, audio_tagging_loss=0.008566, over 3051995.08 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:13:15,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3421673.3333333335, ans=0.0 2023-11-28 08:13:15,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3421673.3333333335, ans=0.5 2023-11-28 08:13:26,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3421673.3333333335, ans=0.125 2023-11-28 08:15:02,594 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3421873.3333333335, ans=0.2 2023-11-28 08:15:29,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3421940.0, ans=0.125 2023-11-28 08:15:51,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3421940.0, ans=0.1 2023-11-28 08:15:54,569 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 513300 2023-11-28 08:15:54,860 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3421940.0, ans=0.125 2023-11-28 08:16:07,990 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.22 vs. limit=15.0 2023-11-28 08:16:08,482 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 8300, loss[loss=0.08077, simple_loss=0.1131, pruned_loss=0.017, audio_tagging_loss=0.007237, over 14925.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.09103, pruned_loss=0.0125, audio_tagging_loss=0.008561, over 3057288.39 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:16:14,491 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.74 vs. limit=15.0 2023-11-28 08:16:49,380 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.431e+01 8.832e+01 9.492e+01 1.019e+02 1.242e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-28 08:16:57,122 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3422073.3333333335, ans=0.125 2023-11-28 08:18:07,748 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.15 vs. limit=15.0 2023-11-28 08:18:34,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3422206.6666666665, ans=0.2 2023-11-28 08:19:10,815 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 513350 2023-11-28 08:19:21,018 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 8350, loss[loss=0.07864, simple_loss=0.1176, pruned_loss=0.01363, audio_tagging_loss=0.006209, over 15469.00 frames. ], tot_loss[loss=0.06653, simple_loss=0.09111, pruned_loss=0.01246, audio_tagging_loss=0.008519, over 3052082.83 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:19:30,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3422340.0, ans=0.125 2023-11-28 08:21:04,510 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.21 vs. limit=15.0 2023-11-28 08:21:04,984 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.60 vs. limit=6.0 2023-11-28 08:21:11,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3422540.0, ans=0.0 2023-11-28 08:21:56,003 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 513400 2023-11-28 08:22:05,638 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 8400, loss[loss=0.04852, simple_loss=0.06417, pruned_loss=0.007295, audio_tagging_loss=0.009139, over 14566.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.09074, pruned_loss=0.01235, audio_tagging_loss=0.008508, over 3053256.56 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 08:22:35,419 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.336e+01 8.870e+01 9.331e+01 1.011e+02 1.281e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-28 08:22:46,316 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.09 vs. limit=22.5 2023-11-28 08:23:08,769 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 08:23:19,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3422806.6666666665, ans=0.125 2023-11-28 08:23:33,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3422873.3333333335, ans=0.0 2023-11-28 08:24:19,173 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 513450 2023-11-28 08:24:27,125 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 8450, loss[loss=0.05848, simple_loss=0.072, pruned_loss=0.01136, audio_tagging_loss=0.01112, over 15332.00 frames. ], tot_loss[loss=0.0658, simple_loss=0.09008, pruned_loss=0.01226, audio_tagging_loss=0.008503, over 3051734.58 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 08:25:16,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3423140.0, ans=0.125 2023-11-28 08:25:21,057 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3423140.0, ans=0.0 2023-11-28 08:25:51,401 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3423206.6666666665, ans=0.125 2023-11-28 08:26:29,693 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 513500 2023-11-28 08:26:35,542 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 8500, loss[loss=0.07817, simple_loss=0.1139, pruned_loss=0.0152, audio_tagging_loss=0.006017, over 16642.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.09017, pruned_loss=0.01234, audio_tagging_loss=0.008555, over 3053641.24 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 08:26:56,774 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.95 vs. limit=10.0 2023-11-28 08:27:00,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3423406.6666666665, ans=0.125 2023-11-28 08:27:06,468 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.889e+01 8.890e+01 9.437e+01 1.019e+02 2.913e+02, threshold=1.887e+02, percent-clipped=1.0 2023-11-28 08:28:27,968 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.10 vs. limit=15.0 2023-11-28 08:28:36,277 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 513550 2023-11-28 08:28:41,114 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3423606.6666666665, ans=0.0 2023-11-28 08:28:44,903 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 8550, loss[loss=0.05125, simple_loss=0.06902, pruned_loss=0.008076, audio_tagging_loss=0.008662, over 13252.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.0893, pruned_loss=0.01217, audio_tagging_loss=0.008572, over 3046194.59 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 08:28:51,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3423673.3333333335, ans=0.1 2023-11-28 08:29:01,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3423673.3333333335, ans=0.04949747468305833 2023-11-28 08:29:14,118 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 08:29:18,860 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3423740.0, ans=0.1 2023-11-28 08:29:21,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3423740.0, ans=0.125 2023-11-28 08:29:52,554 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3423806.6666666665, ans=0.125 2023-11-28 08:29:56,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3423873.3333333335, ans=0.1 2023-11-28 08:30:30,731 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 513600 2023-11-28 08:30:37,898 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 8600, loss[loss=0.0762, simple_loss=0.1067, pruned_loss=0.0164, audio_tagging_loss=0.006459, over 15712.00 frames. ], tot_loss[loss=0.06482, simple_loss=0.0883, pruned_loss=0.01205, audio_tagging_loss=0.008617, over 3044277.05 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 08:30:52,695 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3424006.6666666665, ans=0.0 2023-11-28 08:30:55,032 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.36 vs. limit=6.0 2023-11-28 08:30:57,257 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.252e+01 8.892e+01 9.588e+01 1.028e+02 1.351e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-28 08:31:24,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3424140.0, ans=0.1 2023-11-28 08:31:27,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3424140.0, ans=0.125 2023-11-28 08:32:09,755 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 513650 2023-11-28 08:32:14,164 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 8650, loss[loss=0.07436, simple_loss=0.09996, pruned_loss=0.01716, audio_tagging_loss=0.007219, over 17161.00 frames. ], tot_loss[loss=0.06485, simple_loss=0.08832, pruned_loss=0.01202, audio_tagging_loss=0.00867, over 3050698.41 frames. ], batch size: 62, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 08:32:26,644 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.54 vs. limit=10.0 2023-11-28 08:32:42,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3424406.6666666665, ans=0.0 2023-11-28 08:32:56,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3424473.3333333335, ans=0.0 2023-11-28 08:32:58,605 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.56 vs. limit=15.0 2023-11-28 08:33:09,050 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.96 vs. limit=15.0 2023-11-28 08:33:13,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3424540.0, ans=0.07 2023-11-28 08:33:15,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3424540.0, ans=0.125 2023-11-28 08:33:29,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3424540.0, ans=0.1 2023-11-28 08:33:45,769 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 513700 2023-11-28 08:33:45,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3424606.6666666665, ans=0.5 2023-11-28 08:33:46,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3424606.6666666665, ans=0.125 2023-11-28 08:33:50,744 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 8700, loss[loss=0.08054, simple_loss=0.1093, pruned_loss=0.0162, audio_tagging_loss=0.009662, over 15188.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.08982, pruned_loss=0.01215, audio_tagging_loss=0.008687, over 3056924.23 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:34:09,559 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3424740.0, ans=0.125 2023-11-28 08:34:13,090 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.578e+01 8.836e+01 9.429e+01 1.013e+02 1.223e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-28 08:34:20,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3424740.0, ans=0.0 2023-11-28 08:34:24,215 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3424740.0, ans=0.1 2023-11-28 08:35:07,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3424940.0, ans=0.125 2023-11-28 08:35:15,357 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 513750 2023-11-28 08:35:20,374 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 8750, loss[loss=0.08758, simple_loss=0.1284, pruned_loss=0.01683, audio_tagging_loss=0.00653, over 15699.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.09004, pruned_loss=0.01224, audio_tagging_loss=0.008723, over 3052810.00 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:35:27,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3425006.6666666665, ans=0.125 2023-11-28 08:35:44,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3425073.3333333335, ans=0.0 2023-11-28 08:35:59,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3425140.0, ans=0.0 2023-11-28 08:36:04,344 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3425140.0, ans=0.125 2023-11-28 08:36:25,107 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=7.21 vs. limit=12.0 2023-11-28 08:36:32,324 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.54 vs. limit=10.0 2023-11-28 08:36:39,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3425273.3333333335, ans=0.0 2023-11-28 08:36:48,767 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 513800 2023-11-28 08:36:54,402 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 8800, loss[loss=0.06505, simple_loss=0.08733, pruned_loss=0.01277, audio_tagging_loss=0.008611, over 14491.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.09005, pruned_loss=0.01224, audio_tagging_loss=0.008805, over 3057763.01 frames. ], batch size: 53, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 08:37:05,562 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.00 vs. limit=8.0 2023-11-28 08:37:13,360 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.671e+01 8.831e+01 9.235e+01 9.998e+01 1.254e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-28 08:37:37,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_na.min_abs, batch_count=3425473.3333333335, ans=0.02 2023-11-28 08:37:46,358 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3425540.0, ans=0.125 2023-11-28 08:37:52,282 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3425540.0, ans=0.0 2023-11-28 08:37:55,028 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3425540.0, ans=0.125 2023-11-28 08:38:10,714 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 513850 2023-11-28 08:38:15,037 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 8850, loss[loss=0.06478, simple_loss=0.08226, pruned_loss=0.01607, audio_tagging_loss=0.007579, over 13943.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.09027, pruned_loss=0.01229, audio_tagging_loss=0.008822, over 3049578.56 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:38:36,774 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 08:38:38,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3425740.0, ans=0.125 2023-11-28 08:38:57,201 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3425806.6666666665, ans=0.125 2023-11-28 08:39:14,900 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3425873.3333333335, ans=0.125 2023-11-28 08:39:18,580 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.51 vs. limit=22.5 2023-11-28 08:39:31,450 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 513900 2023-11-28 08:39:32,816 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3425940.0, ans=0.125 2023-11-28 08:39:36,193 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 8900, loss[loss=0.07012, simple_loss=0.09842, pruned_loss=0.009985, audio_tagging_loss=0.01093, over 15890.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.0911, pruned_loss=0.01238, audio_tagging_loss=0.008664, over 3056549.42 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:39:57,389 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.695e+01 8.722e+01 9.445e+01 1.012e+02 1.187e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-28 08:39:59,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3426073.3333333335, ans=0.125 2023-11-28 08:40:12,699 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.92 vs. limit=15.0 2023-11-28 08:40:33,241 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.97 vs. limit=15.0 2023-11-28 08:40:46,787 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 513950 2023-11-28 08:40:50,574 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 8950, loss[loss=0.07381, simple_loss=0.1026, pruned_loss=0.01507, audio_tagging_loss=0.007449, over 14256.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.09033, pruned_loss=0.01225, audio_tagging_loss=0.008598, over 3056317.09 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 08:40:54,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3426340.0, ans=0.125 2023-11-28 08:41:02,614 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3426406.6666666665, ans=0.125 2023-11-28 08:41:12,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3426406.6666666665, ans=0.1 2023-11-28 08:41:40,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3426540.0, ans=0.0 2023-11-28 08:41:40,784 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.73 vs. limit=6.0 2023-11-28 08:41:42,901 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3426606.6666666665, ans=0.0 2023-11-28 08:41:52,921 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 514000 2023-11-28 08:41:57,104 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 9000, loss[loss=0.08603, simple_loss=0.1207, pruned_loss=0.02073, audio_tagging_loss=0.004956, over 15877.00 frames. ], tot_loss[loss=0.06649, simple_loss=0.09091, pruned_loss=0.01251, audio_tagging_loss=0.008525, over 3051844.25 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 08:41:57,105 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-28 08:42:19,957 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.4724, 3.7666, 3.0954, 3.7669], device='cuda:1') 2023-11-28 08:42:24,578 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.5857, 3.6169, 3.8250, 3.5380], device='cuda:1') 2023-11-28 08:42:35,479 INFO [train_asr.py:1267] (1/4) Epoch 43, validation: loss=0.05867, simple_loss=0.05056, pruned_loss=0.005241, audio_tagging_loss=0.02815, over 4681554.00 frames. 2023-11-28 08:42:35,481 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-28 08:42:53,916 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.358e+01 8.891e+01 9.730e+01 1.046e+02 2.169e+02, threshold=1.946e+02, percent-clipped=1.0 2023-11-28 08:43:00,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3426806.6666666665, ans=0.1 2023-11-28 08:43:03,145 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.49 vs. limit=15.0 2023-11-28 08:43:06,057 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.37 vs. limit=8.0 2023-11-28 08:43:14,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3426873.3333333335, ans=0.1 2023-11-28 08:43:19,636 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3426873.3333333335, ans=0.1 2023-11-28 08:43:33,692 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3426940.0, ans=0.1 2023-11-28 08:43:36,129 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 514050 2023-11-28 08:43:40,606 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 9050, loss[loss=0.06555, simple_loss=0.08839, pruned_loss=0.01349, audio_tagging_loss=0.007868, over 14838.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.08985, pruned_loss=0.01233, audio_tagging_loss=0.00858, over 3049912.97 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 08:43:49,538 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3427006.6666666665, ans=0.125 2023-11-28 08:43:54,541 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3427073.3333333335, ans=0.125 2023-11-28 08:44:01,398 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3427073.3333333335, ans=0.0 2023-11-28 08:44:32,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3427273.3333333335, ans=0.0 2023-11-28 08:44:39,501 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 514100 2023-11-28 08:44:43,138 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 9100, loss[loss=0.07702, simple_loss=0.1092, pruned_loss=0.01662, audio_tagging_loss=0.005784, over 14910.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.09103, pruned_loss=0.01259, audio_tagging_loss=0.008504, over 3047283.72 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 08:44:43,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3427340.0, ans=0.0 2023-11-28 08:45:01,269 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.615e+01 9.021e+01 9.381e+01 1.003e+02 1.228e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-28 08:45:01,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3427406.6666666665, ans=0.2 2023-11-28 08:45:02,760 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3427406.6666666665, ans=0.125 2023-11-28 08:45:10,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3427473.3333333335, ans=0.1 2023-11-28 08:45:14,418 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3427473.3333333335, ans=0.2 2023-11-28 08:45:20,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=3427540.0, ans=15.0 2023-11-28 08:45:23,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3427540.0, ans=0.1 2023-11-28 08:45:32,690 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.56 vs. limit=22.5 2023-11-28 08:45:36,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3427606.6666666665, ans=0.5 2023-11-28 08:45:36,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3427606.6666666665, ans=0.125 2023-11-28 08:45:40,586 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 514150 2023-11-28 08:45:44,477 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 9150, loss[loss=0.05957, simple_loss=0.0876, pruned_loss=0.01031, audio_tagging_loss=0.005457, over 14129.00 frames. ], tot_loss[loss=0.06596, simple_loss=0.09023, pruned_loss=0.0124, audio_tagging_loss=0.008448, over 3046362.99 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 08:45:46,623 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.89 vs. limit=6.0 2023-11-28 08:46:00,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3427740.0, ans=0.125 2023-11-28 08:46:00,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3427740.0, ans=0.125 2023-11-28 08:46:13,387 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3427806.6666666665, ans=0.125 2023-11-28 08:46:39,298 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 514200 2023-11-28 08:46:41,006 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.90 vs. limit=15.0 2023-11-28 08:46:42,869 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 9200, loss[loss=0.08042, simple_loss=0.1168, pruned_loss=0.01481, audio_tagging_loss=0.007215, over 15054.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.09046, pruned_loss=0.01234, audio_tagging_loss=0.008415, over 3046376.61 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:46:55,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3428073.3333333335, ans=0.1 2023-11-28 08:46:58,836 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.290e+01 8.605e+01 9.339e+01 9.879e+01 1.258e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-28 08:47:10,140 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.91 vs. limit=15.0 2023-11-28 08:47:30,122 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3428273.3333333335, ans=0.2 2023-11-28 08:47:36,501 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.05 vs. limit=12.0 2023-11-28 08:47:37,071 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 514250 2023-11-28 08:47:40,306 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 9250, loss[loss=0.05782, simple_loss=0.08044, pruned_loss=0.01046, audio_tagging_loss=0.00714, over 14734.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.08996, pruned_loss=0.01231, audio_tagging_loss=0.008376, over 3052713.13 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:47:49,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3428340.0, ans=0.125 2023-11-28 08:47:56,073 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3428406.6666666665, ans=0.125 2023-11-28 08:48:09,606 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3428473.3333333335, ans=0.0 2023-11-28 08:48:14,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3428540.0, ans=0.2 2023-11-28 08:48:26,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3428606.6666666665, ans=0.125 2023-11-28 08:48:34,877 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 514300 2023-11-28 08:48:37,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3428673.3333333335, ans=0.05 2023-11-28 08:48:38,063 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 9300, loss[loss=0.07124, simple_loss=0.1112, pruned_loss=0.009685, audio_tagging_loss=0.005949, over 15393.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.09036, pruned_loss=0.01221, audio_tagging_loss=0.008353, over 3055644.45 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:48:52,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3428740.0, ans=0.125 2023-11-28 08:48:52,319 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3428740.0, ans=0.125 2023-11-28 08:48:54,142 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.511e+01 8.614e+01 9.246e+01 9.788e+01 1.593e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-28 08:49:01,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3428806.6666666665, ans=0.125 2023-11-28 08:49:29,121 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3428940.0, ans=0.2 2023-11-28 08:49:32,169 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 514350 2023-11-28 08:49:35,357 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 9350, loss[loss=0.05918, simple_loss=0.07958, pruned_loss=0.008346, audio_tagging_loss=0.01104, over 15287.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.0904, pruned_loss=0.01227, audio_tagging_loss=0.00845, over 3054353.29 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:49:53,163 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.75 vs. limit=10.0 2023-11-28 08:49:54,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3429073.3333333335, ans=0.0 2023-11-28 08:49:55,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=3429073.3333333335, ans=0.05 2023-11-28 08:50:08,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3429140.0, ans=0.125 2023-11-28 08:50:08,185 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3429140.0, ans=0.125 2023-11-28 08:50:11,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3429206.6666666665, ans=0.125 2023-11-28 08:50:19,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3429206.6666666665, ans=0.125 2023-11-28 08:50:28,917 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 514400 2023-11-28 08:50:32,406 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 9400, loss[loss=0.07793, simple_loss=0.1063, pruned_loss=0.01696, audio_tagging_loss=0.007795, over 15762.00 frames. ], tot_loss[loss=0.06596, simple_loss=0.09042, pruned_loss=0.01224, audio_tagging_loss=0.008516, over 3054124.68 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:50:33,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3429340.0, ans=0.125 2023-11-28 08:50:48,722 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.388e+01 8.921e+01 9.569e+01 1.013e+02 1.910e+02, threshold=1.914e+02, percent-clipped=1.0 2023-11-28 08:50:51,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3429406.6666666665, ans=0.1 2023-11-28 08:50:55,735 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3429473.3333333335, ans=0.1 2023-11-28 08:50:57,865 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3429473.3333333335, ans=0.125 2023-11-28 08:51:26,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3429606.6666666665, ans=0.0 2023-11-28 08:51:27,032 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 514450 2023-11-28 08:51:28,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=3429606.6666666665, ans=15.0 2023-11-28 08:51:30,088 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 9450, loss[loss=0.06047, simple_loss=0.08956, pruned_loss=0.007783, audio_tagging_loss=0.007909, over 15213.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.09033, pruned_loss=0.01219, audio_tagging_loss=0.008593, over 3053980.78 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:51:32,410 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 08:51:37,245 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.34 vs. limit=15.0 2023-11-28 08:52:07,090 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.50 vs. limit=12.0 2023-11-28 08:52:18,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3429940.0, ans=0.0 2023-11-28 08:52:23,986 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 514500 2023-11-28 08:52:27,131 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 9500, loss[loss=0.06855, simple_loss=0.09222, pruned_loss=0.01158, audio_tagging_loss=0.01086, over 15412.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.09016, pruned_loss=0.01219, audio_tagging_loss=0.008714, over 3055476.10 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:52:42,225 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.327e+01 9.109e+01 9.581e+01 1.028e+02 2.016e+02, threshold=1.916e+02, percent-clipped=1.0 2023-11-28 08:53:00,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3430206.6666666665, ans=0.125 2023-11-28 08:53:20,688 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 514550 2023-11-28 08:53:22,077 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.39 vs. limit=15.0 2023-11-28 08:53:23,784 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 9550, loss[loss=0.075, simple_loss=0.0971, pruned_loss=0.01635, audio_tagging_loss=0.0101, over 15438.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.09049, pruned_loss=0.01206, audio_tagging_loss=0.008806, over 3052077.09 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:53:25,021 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3430340.0, ans=0.125 2023-11-28 08:53:34,581 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3430406.6666666665, ans=0.125 2023-11-28 08:53:34,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_na.min_abs, batch_count=3430406.6666666665, ans=0.02 2023-11-28 08:53:34,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3430406.6666666665, ans=0.125 2023-11-28 08:53:40,505 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3430406.6666666665, ans=0.0 2023-11-28 08:54:05,250 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3430540.0, ans=0.125 2023-11-28 08:54:17,712 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 514600 2023-11-28 08:54:21,562 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 9600, loss[loss=0.06248, simple_loss=0.08379, pruned_loss=0.01106, audio_tagging_loss=0.009531, over 15773.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.08986, pruned_loss=0.01189, audio_tagging_loss=0.008879, over 3051283.16 frames. ], batch size: 61, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 08:54:21,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3430673.3333333335, ans=0.2 2023-11-28 08:54:37,747 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.954e+01 8.927e+01 9.333e+01 1.014e+02 1.212e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-28 08:54:58,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3430873.3333333335, ans=0.0 2023-11-28 08:55:11,523 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3430940.0, ans=0.2 2023-11-28 08:55:16,192 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 514650 2023-11-28 08:55:19,456 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 9650, loss[loss=0.04433, simple_loss=0.05666, pruned_loss=0.007413, audio_tagging_loss=0.008588, over 15912.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.08988, pruned_loss=0.01209, audio_tagging_loss=0.008878, over 3048624.11 frames. ], batch size: 60, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 08:55:30,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3431073.3333333335, ans=0.0 2023-11-28 08:55:33,574 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.04 vs. limit=10.0 2023-11-28 08:55:42,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3431140.0, ans=0.0 2023-11-28 08:55:47,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3431140.0, ans=0.1 2023-11-28 08:55:52,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3431206.6666666665, ans=0.2 2023-11-28 08:56:07,146 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.26 vs. limit=6.0 2023-11-28 08:56:11,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3431273.3333333335, ans=0.1 2023-11-28 08:56:13,096 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 514700 2023-11-28 08:56:16,275 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 9700, loss[loss=0.05138, simple_loss=0.06734, pruned_loss=0.009142, audio_tagging_loss=0.008568, over 15048.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.08972, pruned_loss=0.01218, audio_tagging_loss=0.008789, over 3039355.16 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 08:56:19,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3431340.0, ans=0.125 2023-11-28 08:56:25,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3431340.0, ans=0.2 2023-11-28 08:56:31,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3431406.6666666665, ans=0.0 2023-11-28 08:56:32,353 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.192e+01 8.860e+01 9.541e+01 1.023e+02 1.192e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-28 08:56:45,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3431473.3333333335, ans=0.125 2023-11-28 08:56:55,827 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3431540.0, ans=0.0 2023-11-28 08:57:09,686 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 514750 2023-11-28 08:57:12,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3431673.3333333335, ans=0.0 2023-11-28 08:57:13,585 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 9750, loss[loss=0.05865, simple_loss=0.08141, pruned_loss=0.009505, audio_tagging_loss=0.008433, over 16284.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.08935, pruned_loss=0.01216, audio_tagging_loss=0.008782, over 3040753.00 frames. ], batch size: 61, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 08:57:14,035 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.69 vs. limit=15.0 2023-11-28 08:57:23,513 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3431673.3333333335, ans=0.125 2023-11-28 08:57:41,282 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 08:58:01,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3431940.0, ans=0.0 2023-11-28 08:58:02,114 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.09 vs. limit=10.0 2023-11-28 08:58:07,942 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 514800 2023-11-28 08:58:11,307 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 9800, loss[loss=0.06372, simple_loss=0.08937, pruned_loss=0.009976, audio_tagging_loss=0.009064, over 16717.00 frames. ], tot_loss[loss=0.06495, simple_loss=0.08851, pruned_loss=0.01198, audio_tagging_loss=0.008725, over 3044760.57 frames. ], batch size: 64, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:58:15,732 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3432006.6666666665, ans=0.0 2023-11-28 08:58:18,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3432006.6666666665, ans=0.09899494936611666 2023-11-28 08:58:22,376 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3432073.3333333335, ans=0.2 2023-11-28 08:58:27,618 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.077e+01 8.900e+01 9.501e+01 1.026e+02 1.176e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-28 08:58:30,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3432073.3333333335, ans=0.05 2023-11-28 08:58:36,683 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3432140.0, ans=0.2 2023-11-28 08:58:58,042 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.50 vs. limit=12.0 2023-11-28 08:59:05,122 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 514850 2023-11-28 08:59:06,175 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 08:59:08,271 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 9850, loss[loss=0.06622, simple_loss=0.0978, pruned_loss=0.009559, audio_tagging_loss=0.00776, over 15645.00 frames. ], tot_loss[loss=0.06496, simple_loss=0.0885, pruned_loss=0.01197, audio_tagging_loss=0.008741, over 3044823.74 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:59:12,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3432340.0, ans=0.1 2023-11-28 08:59:32,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3432473.3333333335, ans=0.0 2023-11-28 08:59:34,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3432473.3333333335, ans=0.125 2023-11-28 08:59:39,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3432473.3333333335, ans=0.0 2023-11-28 08:59:42,202 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.58 vs. limit=15.0 2023-11-28 08:59:43,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3432540.0, ans=0.0 2023-11-28 08:59:57,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3432606.6666666665, ans=0.0 2023-11-28 08:59:58,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3432606.6666666665, ans=0.125 2023-11-28 09:00:01,455 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 514900 2023-11-28 09:00:03,781 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3432673.3333333335, ans=0.1 2023-11-28 09:00:04,608 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 9900, loss[loss=0.04145, simple_loss=0.04905, pruned_loss=0.005117, audio_tagging_loss=0.0118, over 15287.00 frames. ], tot_loss[loss=0.06482, simple_loss=0.08826, pruned_loss=0.012, audio_tagging_loss=0.008697, over 3039374.00 frames. ], batch size: 61, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 09:00:04,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3432673.3333333335, ans=0.125 2023-11-28 09:00:15,353 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3432740.0, ans=0.125 2023-11-28 09:00:23,102 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.508e+01 8.866e+01 9.531e+01 1.026e+02 1.362e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-28 09:00:59,170 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 514950 2023-11-28 09:01:03,317 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 9950, loss[loss=0.06436, simple_loss=0.08043, pruned_loss=0.01046, audio_tagging_loss=0.01369, over 15483.00 frames. ], tot_loss[loss=0.06522, simple_loss=0.0889, pruned_loss=0.0121, audio_tagging_loss=0.00867, over 3052229.11 frames. ], batch size: 58, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:01:09,562 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3433006.6666666665, ans=0.2 2023-11-28 09:01:37,529 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.03 vs. limit=15.0 2023-11-28 09:01:38,351 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3433206.6666666665, ans=0.125 2023-11-28 09:01:41,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=3433206.6666666665, ans=0.02 2023-11-28 09:01:53,404 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.09 vs. limit=15.0 2023-11-28 09:01:57,412 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 515000 2023-11-28 09:02:00,835 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 10000, loss[loss=0.07711, simple_loss=0.1028, pruned_loss=0.01737, audio_tagging_loss=0.00831, over 15264.00 frames. ], tot_loss[loss=0.06497, simple_loss=0.08882, pruned_loss=0.01196, audio_tagging_loss=0.008604, over 3054480.86 frames. ], batch size: 56, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:02:02,652 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.83 vs. limit=15.0 2023-11-28 09:02:14,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3433406.6666666665, ans=0.04949747468305833 2023-11-28 09:02:18,951 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.546e+01 8.838e+01 9.507e+01 1.055e+02 1.169e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-28 09:02:25,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3433473.3333333335, ans=0.125 2023-11-28 09:02:28,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3433473.3333333335, ans=0.125 2023-11-28 09:02:45,832 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3433606.6666666665, ans=0.1 2023-11-28 09:02:54,366 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 515050 2023-11-28 09:02:57,653 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 10050, loss[loss=0.05741, simple_loss=0.08127, pruned_loss=0.008254, audio_tagging_loss=0.008521, over 14528.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08969, pruned_loss=0.01215, audio_tagging_loss=0.008644, over 3053714.45 frames. ], batch size: 55, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:03:02,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3433673.3333333335, ans=0.125 2023-11-28 09:03:42,932 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.81 vs. limit=6.0 2023-11-28 09:03:48,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3433940.0, ans=0.0 2023-11-28 09:03:51,666 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 515100 2023-11-28 09:03:55,269 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 10100, loss[loss=0.07942, simple_loss=0.1112, pruned_loss=0.01594, audio_tagging_loss=0.007899, over 15347.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08949, pruned_loss=0.01212, audio_tagging_loss=0.008717, over 3058556.08 frames. ], batch size: 57, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:04:00,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3434006.6666666665, ans=0.125 2023-11-28 09:04:07,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3434073.3333333335, ans=0.0 2023-11-28 09:04:13,638 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.316e+01 8.780e+01 9.411e+01 9.939e+01 1.267e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-28 09:04:16,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3434073.3333333335, ans=0.125 2023-11-28 09:04:43,467 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.99 vs. limit=6.0 2023-11-28 09:04:45,969 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 09:04:47,201 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3434273.3333333335, ans=10.0 2023-11-28 09:04:49,314 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 515150 2023-11-28 09:04:53,049 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 10150, loss[loss=0.06081, simple_loss=0.07971, pruned_loss=0.01167, audio_tagging_loss=0.009288, over 15661.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.08986, pruned_loss=0.01215, audio_tagging_loss=0.008708, over 3057459.80 frames. ], batch size: 57, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:04:53,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3434340.0, ans=0.125 2023-11-28 09:05:08,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3434406.6666666665, ans=0.0 2023-11-28 09:05:08,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3434406.6666666665, ans=0.07 2023-11-28 09:05:16,704 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.76 vs. limit=15.0 2023-11-28 09:05:23,676 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 09:05:27,510 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.61 vs. limit=15.0 2023-11-28 09:05:29,226 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3434540.0, ans=0.0 2023-11-28 09:05:37,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3434606.6666666665, ans=0.0 2023-11-28 09:05:45,877 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 515200 2023-11-28 09:05:49,224 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 10200, loss[loss=0.09028, simple_loss=0.1299, pruned_loss=0.0175, audio_tagging_loss=0.007825, over 16138.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.09028, pruned_loss=0.01225, audio_tagging_loss=0.008834, over 3063429.78 frames. ], batch size: 56, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:05:51,562 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3434673.3333333335, ans=0.0 2023-11-28 09:06:08,209 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.343e+01 8.883e+01 9.493e+01 1.013e+02 1.248e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-28 09:06:09,803 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.49 vs. limit=15.0 2023-11-28 09:06:10,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3434740.0, ans=0.125 2023-11-28 09:06:12,927 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3434806.6666666665, ans=0.1 2023-11-28 09:06:14,812 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 09:06:25,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3434873.3333333335, ans=0.125 2023-11-28 09:06:43,235 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 515250 2023-11-28 09:06:45,963 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.35 vs. limit=15.0 2023-11-28 09:06:46,375 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 10250, loss[loss=0.04606, simple_loss=0.04923, pruned_loss=0.01198, audio_tagging_loss=0.009465, over 14806.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.0898, pruned_loss=0.0123, audio_tagging_loss=0.008917, over 3060714.90 frames. ], batch size: 57, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:06:46,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3435006.6666666665, ans=0.125 2023-11-28 09:06:55,669 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.48 vs. limit=15.0 2023-11-28 09:07:40,811 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 515300 2023-11-28 09:07:44,050 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 10300, loss[loss=0.07482, simple_loss=0.09237, pruned_loss=0.01829, audio_tagging_loss=0.01035, over 15204.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.08985, pruned_loss=0.01225, audio_tagging_loss=0.008899, over 3060048.71 frames. ], batch size: 59, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:07:44,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3435340.0, ans=0.125 2023-11-28 09:07:55,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3435406.6666666665, ans=0.035 2023-11-28 09:07:56,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3435406.6666666665, ans=0.125 2023-11-28 09:08:01,428 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.11 vs. limit=22.5 2023-11-28 09:08:01,880 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.744e+01 9.048e+01 9.599e+01 1.061e+02 1.681e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-28 09:08:28,507 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.45 vs. limit=15.0 2023-11-28 09:08:37,146 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 515350 2023-11-28 09:08:40,333 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 10350, loss[loss=0.08002, simple_loss=0.1111, pruned_loss=0.01562, audio_tagging_loss=0.008851, over 15227.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.09024, pruned_loss=0.01229, audio_tagging_loss=0.008956, over 3056740.65 frames. ], batch size: 55, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:08:41,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3435673.3333333335, ans=0.0 2023-11-28 09:08:59,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3435740.0, ans=0.125 2023-11-28 09:09:32,974 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.03 vs. limit=15.0 2023-11-28 09:09:33,525 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 515400 2023-11-28 09:09:36,954 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 10400, loss[loss=0.06502, simple_loss=0.08847, pruned_loss=0.01056, audio_tagging_loss=0.01022, over 15921.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.09008, pruned_loss=0.01234, audio_tagging_loss=0.009009, over 3056533.47 frames. ], batch size: 60, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:09:41,604 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.00 vs. limit=6.0 2023-11-28 09:09:54,540 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.351e+01 8.993e+01 9.634e+01 1.025e+02 1.288e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-28 09:10:18,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3436206.6666666665, ans=0.125 2023-11-28 09:10:30,071 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 515450 2023-11-28 09:10:33,225 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 10450, loss[loss=0.06215, simple_loss=0.08054, pruned_loss=0.01388, audio_tagging_loss=0.007995, over 14883.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.08962, pruned_loss=0.01215, audio_tagging_loss=0.008973, over 3052533.38 frames. ], batch size: 57, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:11:19,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3436606.6666666665, ans=0.125 2023-11-28 09:11:21,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3436606.6666666665, ans=0.125 2023-11-28 09:11:27,004 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 515500 2023-11-28 09:11:28,381 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.86 vs. limit=10.0 2023-11-28 09:11:30,125 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 10500, loss[loss=0.06518, simple_loss=0.09271, pruned_loss=0.01131, audio_tagging_loss=0.007525, over 15398.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.09005, pruned_loss=0.01221, audio_tagging_loss=0.008739, over 3058766.35 frames. ], batch size: 56, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:11:42,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3436740.0, ans=0.125 2023-11-28 09:11:48,944 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.406e+01 8.855e+01 9.374e+01 1.019e+02 1.300e+02, threshold=1.875e+02, percent-clipped=0.0 2023-11-28 09:11:52,092 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3436740.0, ans=0.0 2023-11-28 09:12:16,522 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3436940.0, ans=0.125 2023-11-28 09:12:25,161 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 515550 2023-11-28 09:12:28,322 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 10550, loss[loss=0.07215, simple_loss=0.09792, pruned_loss=0.01486, audio_tagging_loss=0.008332, over 15439.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.09021, pruned_loss=0.01226, audio_tagging_loss=0.008769, over 3059302.50 frames. ], batch size: 57, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:12:45,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3437073.3333333335, ans=0.125 2023-11-28 09:13:21,723 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 515600 2023-11-28 09:13:23,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3437273.3333333335, ans=0.05 2023-11-28 09:13:25,233 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 10600, loss[loss=0.06024, simple_loss=0.08324, pruned_loss=0.00987, audio_tagging_loss=0.008753, over 15343.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.0899, pruned_loss=0.01223, audio_tagging_loss=0.008687, over 3053626.35 frames. ], batch size: 58, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:13:30,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3437340.0, ans=0.0 2023-11-28 09:13:42,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3437406.6666666665, ans=0.125 2023-11-28 09:13:42,820 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.765e+01 9.109e+01 9.906e+01 1.072e+02 1.462e+02, threshold=1.981e+02, percent-clipped=0.0 2023-11-28 09:13:47,844 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.40 vs. limit=6.0 2023-11-28 09:13:58,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3437540.0, ans=0.125 2023-11-28 09:14:02,080 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3437540.0, ans=0.125 2023-11-28 09:14:04,465 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.47 vs. limit=15.0 2023-11-28 09:14:05,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3437540.0, ans=0.125 2023-11-28 09:14:11,629 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3437606.6666666665, ans=0.0 2023-11-28 09:14:13,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3437606.6666666665, ans=0.2 2023-11-28 09:14:17,950 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 515650 2023-11-28 09:14:21,254 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 10650, loss[loss=0.0523, simple_loss=0.06489, pruned_loss=0.009751, audio_tagging_loss=0.0101, over 14546.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.09124, pruned_loss=0.01247, audio_tagging_loss=0.008553, over 3054276.45 frames. ], batch size: 55, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:14:28,888 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3437673.3333333335, ans=0.2 2023-11-28 09:14:34,703 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.21 vs. limit=15.0 2023-11-28 09:14:37,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3437740.0, ans=0.0 2023-11-28 09:14:41,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3437740.0, ans=0.0 2023-11-28 09:14:48,738 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3437806.6666666665, ans=0.07 2023-11-28 09:14:57,577 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.82 vs. limit=12.0 2023-11-28 09:15:11,554 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3437940.0, ans=0.1 2023-11-28 09:15:14,212 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 515700 2023-11-28 09:15:17,432 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 10700, loss[loss=0.06656, simple_loss=0.08523, pruned_loss=0.015, audio_tagging_loss=0.00895, over 14901.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.09109, pruned_loss=0.01244, audio_tagging_loss=0.008536, over 3052271.94 frames. ], batch size: 56, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:15:19,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3438006.6666666665, ans=0.0 2023-11-28 09:15:36,695 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.621e+01 8.910e+01 9.467e+01 1.013e+02 1.295e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-28 09:15:37,013 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 09:15:39,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3438140.0, ans=0.125 2023-11-28 09:15:39,107 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3438140.0, ans=0.07 2023-11-28 09:15:44,536 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3438140.0, ans=0.0 2023-11-28 09:15:49,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3438140.0, ans=0.125 2023-11-28 09:15:53,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3438206.6666666665, ans=0.125 2023-11-28 09:15:55,393 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3438206.6666666665, ans=0.125 2023-11-28 09:16:10,845 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 515750 2023-11-28 09:16:13,978 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 10750, loss[loss=0.04304, simple_loss=0.0627, pruned_loss=0.005574, audio_tagging_loss=0.006116, over 14828.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.0904, pruned_loss=0.01232, audio_tagging_loss=0.008488, over 3051027.87 frames. ], batch size: 56, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:16:48,827 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3438540.0, ans=10.0 2023-11-28 09:16:49,917 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3438540.0, ans=0.0 2023-11-28 09:16:51,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3438540.0, ans=0.2 2023-11-28 09:16:52,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3438540.0, ans=0.125 2023-11-28 09:17:06,589 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 515800 2023-11-28 09:17:10,082 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 10800, loss[loss=0.06018, simple_loss=0.07471, pruned_loss=0.01327, audio_tagging_loss=0.009558, over 14686.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.08964, pruned_loss=0.01219, audio_tagging_loss=0.008504, over 3047203.60 frames. ], batch size: 55, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:17:13,520 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3438673.3333333335, ans=0.125 2023-11-28 09:17:27,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3438740.0, ans=0.125 2023-11-28 09:17:29,083 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.283e+01 8.659e+01 9.192e+01 9.823e+01 1.353e+02, threshold=1.838e+02, percent-clipped=0.0 2023-11-28 09:17:39,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3438806.6666666665, ans=0.0 2023-11-28 09:18:02,577 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 515850 2023-11-28 09:18:06,533 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 10850, loss[loss=0.06416, simple_loss=0.08803, pruned_loss=0.01273, audio_tagging_loss=0.007412, over 14920.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.08923, pruned_loss=0.01231, audio_tagging_loss=0.008482, over 3042101.77 frames. ], batch size: 55, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:18:24,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3439073.3333333335, ans=0.1 2023-11-28 09:18:59,479 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 515900 2023-11-28 09:19:03,287 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 10900, loss[loss=0.05324, simple_loss=0.06845, pruned_loss=0.00969, audio_tagging_loss=0.009328, over 13993.00 frames. ], tot_loss[loss=0.06507, simple_loss=0.08855, pruned_loss=0.01217, audio_tagging_loss=0.008628, over 3041763.90 frames. ], batch size: 54, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:19:03,308 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 09:19:05,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3439340.0, ans=0.125 2023-11-28 09:19:12,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3439340.0, ans=0.2 2023-11-28 09:19:12,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3439340.0, ans=0.07 2023-11-28 09:19:12,535 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.18 vs. limit=12.0 2023-11-28 09:19:21,910 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.849e+01 9.090e+01 9.658e+01 1.040e+02 1.317e+02, threshold=1.932e+02, percent-clipped=0.0 2023-11-28 09:19:44,334 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.91 vs. limit=15.0 2023-11-28 09:19:54,473 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3439606.6666666665, ans=0.125 2023-11-28 09:19:54,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3439606.6666666665, ans=0.1 2023-11-28 09:19:55,422 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3439606.6666666665, ans=0.0 2023-11-28 09:19:56,344 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 515950 2023-11-28 09:19:59,474 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 10950, loss[loss=0.08928, simple_loss=0.1315, pruned_loss=0.0191, audio_tagging_loss=0.00442, over 15511.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.08916, pruned_loss=0.01229, audio_tagging_loss=0.008554, over 3046814.59 frames. ], batch size: 55, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:20:00,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3439673.3333333335, ans=0.125 2023-11-28 09:20:04,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3439673.3333333335, ans=0.2 2023-11-28 09:20:09,719 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.70 vs. limit=12.0 2023-11-28 09:20:10,728 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.18 vs. limit=15.0 2023-11-28 09:20:21,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3439806.6666666665, ans=0.125 2023-11-28 09:20:25,672 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3439806.6666666665, ans=0.125 2023-11-28 09:20:29,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3439806.6666666665, ans=0.125 2023-11-28 09:20:49,379 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.89 vs. limit=22.5 2023-11-28 09:20:52,105 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 516000 2023-11-28 09:20:57,615 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 11000, loss[loss=0.05984, simple_loss=0.0753, pruned_loss=0.01197, audio_tagging_loss=0.01022, over 14611.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.08991, pruned_loss=0.01235, audio_tagging_loss=0.008568, over 3044262.55 frames. ], batch size: 55, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:21:10,812 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 09:21:17,809 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.160e+01 8.606e+01 9.397e+01 9.983e+01 1.237e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-28 09:21:21,413 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3440140.0, ans=0.125 2023-11-28 09:21:31,707 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.56 vs. limit=15.0 2023-11-28 09:21:48,621 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.43 vs. limit=22.5 2023-11-28 09:21:51,216 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 516050 2023-11-28 09:21:54,936 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 11050, loss[loss=0.05428, simple_loss=0.07241, pruned_loss=0.00955, audio_tagging_loss=0.008523, over 16313.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.08987, pruned_loss=0.01241, audio_tagging_loss=0.008627, over 3046014.72 frames. ], batch size: 62, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:22:01,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3440340.0, ans=0.07 2023-11-28 09:22:02,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3440340.0, ans=0.07 2023-11-28 09:22:24,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3440473.3333333335, ans=0.0 2023-11-28 09:22:45,592 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3440606.6666666665, ans=0.1 2023-11-28 09:22:47,761 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 09:22:48,722 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 516100 2023-11-28 09:22:52,005 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 11100, loss[loss=0.05985, simple_loss=0.08095, pruned_loss=0.008807, audio_tagging_loss=0.01057, over 15178.00 frames. ], tot_loss[loss=0.06589, simple_loss=0.0896, pruned_loss=0.01243, audio_tagging_loss=0.008662, over 3047037.65 frames. ], batch size: 57, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:23:02,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3440740.0, ans=0.2 2023-11-28 09:23:04,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3440740.0, ans=0.2 2023-11-28 09:23:11,942 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.72 vs. limit=15.0 2023-11-28 09:23:12,321 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.493e+01 8.852e+01 9.435e+01 1.052e+02 1.493e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-28 09:23:45,867 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 516150 2023-11-28 09:23:47,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3440940.0, ans=0.1 2023-11-28 09:23:49,045 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 11150, loss[loss=0.08541, simple_loss=0.1234, pruned_loss=0.01704, audio_tagging_loss=0.006688, over 15670.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.08904, pruned_loss=0.01226, audio_tagging_loss=0.008871, over 3045071.17 frames. ], batch size: 56, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:23:54,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3441006.6666666665, ans=0.125 2023-11-28 09:24:12,126 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.17 vs. limit=15.0 2023-11-28 09:24:15,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3441140.0, ans=0.0 2023-11-28 09:24:15,373 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3441140.0, ans=0.0 2023-11-28 09:24:15,522 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.12 vs. limit=12.0 2023-11-28 09:24:27,541 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3441206.6666666665, ans=0.125 2023-11-28 09:24:41,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3441273.3333333335, ans=0.1 2023-11-28 09:24:43,355 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 516200 2023-11-28 09:24:43,459 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=3441273.3333333335, ans=0.1 2023-11-28 09:24:47,385 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 11200, loss[loss=0.05758, simple_loss=0.07996, pruned_loss=0.009472, audio_tagging_loss=0.008134, over 15042.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.08942, pruned_loss=0.01233, audio_tagging_loss=0.008907, over 3039365.34 frames. ], batch size: 55, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:25:07,966 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.180e+01 8.684e+01 9.493e+01 1.049e+02 1.376e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-28 09:25:11,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3441473.3333333335, ans=0.0 2023-11-28 09:25:23,481 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.27 vs. limit=12.0 2023-11-28 09:25:29,809 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3441540.0, ans=0.025 2023-11-28 09:25:32,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3441606.6666666665, ans=0.0 2023-11-28 09:25:37,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3441606.6666666665, ans=0.125 2023-11-28 09:25:38,623 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.88 vs. limit=6.0 2023-11-28 09:25:41,300 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 516250 2023-11-28 09:25:44,999 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 11250, loss[loss=0.07334, simple_loss=0.09899, pruned_loss=0.01412, audio_tagging_loss=0.00972, over 14772.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.08929, pruned_loss=0.01227, audio_tagging_loss=0.008796, over 3043158.11 frames. ], batch size: 54, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:25:51,900 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3441673.3333333335, ans=0.125 2023-11-28 09:25:54,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3441673.3333333335, ans=0.1 2023-11-28 09:26:17,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3441806.6666666665, ans=0.125 2023-11-28 09:26:38,725 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 516300 2023-11-28 09:26:41,912 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 11300, loss[loss=0.07173, simple_loss=0.1027, pruned_loss=0.01444, audio_tagging_loss=0.005948, over 15528.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.08906, pruned_loss=0.01229, audio_tagging_loss=0.00859, over 3038500.76 frames. ], batch size: 59, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:26:43,173 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3442006.6666666665, ans=0.2 2023-11-28 09:27:00,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3442073.3333333335, ans=0.125 2023-11-28 09:27:02,754 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.774e+01 8.901e+01 9.622e+01 1.003e+02 2.071e+02, threshold=1.924e+02, percent-clipped=1.0 2023-11-28 09:27:04,092 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3442140.0, ans=0.125 2023-11-28 09:27:10,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3442140.0, ans=0.0 2023-11-28 09:27:25,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3442206.6666666665, ans=0.125 2023-11-28 09:27:27,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3442273.3333333335, ans=0.1 2023-11-28 09:27:28,753 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.06 vs. limit=15.0 2023-11-28 09:27:35,495 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 516350 2023-11-28 09:27:38,734 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 11350, loss[loss=0.0802, simple_loss=0.1096, pruned_loss=0.01663, audio_tagging_loss=0.008768, over 14693.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08912, pruned_loss=0.01226, audio_tagging_loss=0.008633, over 3042364.05 frames. ], batch size: 56, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:27:46,382 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 09:28:05,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3442473.3333333335, ans=0.125 2023-11-28 09:28:32,972 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 516400 2023-11-28 09:28:36,506 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 11400, loss[loss=0.07011, simple_loss=0.08589, pruned_loss=0.015, audio_tagging_loss=0.01217, over 14938.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.08879, pruned_loss=0.01227, audio_tagging_loss=0.008582, over 3045549.05 frames. ], batch size: 58, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:28:40,039 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3442673.3333333335, ans=0.125 2023-11-28 09:28:56,366 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.040e+01 8.771e+01 9.196e+01 9.896e+01 1.286e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-28 09:28:56,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3442740.0, ans=0.0 2023-11-28 09:29:00,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3442806.6666666665, ans=0.0 2023-11-28 09:29:18,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=3442873.3333333335, ans=10.0 2023-11-28 09:29:26,579 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3442940.0, ans=0.125 2023-11-28 09:29:30,226 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 516450 2023-11-28 09:29:33,460 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 11450, loss[loss=0.05376, simple_loss=0.06289, pruned_loss=0.01122, audio_tagging_loss=0.0111, over 15212.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08846, pruned_loss=0.0123, audio_tagging_loss=0.008645, over 3048684.63 frames. ], batch size: 60, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:29:39,559 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.17 vs. limit=22.5 2023-11-28 09:29:55,988 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3443140.0, ans=0.2 2023-11-28 09:30:01,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3443140.0, ans=0.125 2023-11-28 09:30:15,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3443206.6666666665, ans=0.2 2023-11-28 09:30:21,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3443273.3333333335, ans=0.125 2023-11-28 09:30:22,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3443273.3333333335, ans=0.125 2023-11-28 09:30:27,784 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 516500 2023-11-28 09:30:30,966 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 11500, loss[loss=0.07058, simple_loss=0.08931, pruned_loss=0.01565, audio_tagging_loss=0.01027, over 15163.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.08954, pruned_loss=0.0125, audio_tagging_loss=0.008553, over 3043770.98 frames. ], batch size: 59, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:30:40,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3443340.0, ans=0.1 2023-11-28 09:30:42,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3443406.6666666665, ans=0.0 2023-11-28 09:30:52,638 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.962e+01 8.608e+01 9.367e+01 9.940e+01 1.192e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-28 09:30:52,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3443473.3333333335, ans=0.125 2023-11-28 09:31:15,971 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 09:31:16,397 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.38 vs. limit=22.5 2023-11-28 09:31:25,575 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 516550 2023-11-28 09:31:28,720 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 11550, loss[loss=0.06322, simple_loss=0.08333, pruned_loss=0.01174, audio_tagging_loss=0.009817, over 15074.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.08986, pruned_loss=0.01249, audio_tagging_loss=0.008521, over 3045997.14 frames. ], batch size: 57, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:31:28,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3443673.3333333335, ans=0.125 2023-11-28 09:31:56,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3443806.6666666665, ans=0.07 2023-11-28 09:32:06,706 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 09:32:06,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3443873.3333333335, ans=0.0 2023-11-28 09:32:21,768 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 516600 2023-11-28 09:32:25,236 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 11600, loss[loss=0.07526, simple_loss=0.1084, pruned_loss=0.01505, audio_tagging_loss=0.006021, over 16181.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08923, pruned_loss=0.01229, audio_tagging_loss=0.008621, over 3050824.77 frames. ], batch size: 58, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:32:38,606 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3444073.3333333335, ans=0.125 2023-11-28 09:32:47,370 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.414e+01 8.769e+01 9.333e+01 1.033e+02 1.788e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-28 09:33:05,701 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.30 vs. limit=15.0 2023-11-28 09:33:07,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3444206.6666666665, ans=0.0 2023-11-28 09:33:11,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3444273.3333333335, ans=0.0 2023-11-28 09:33:18,774 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 516650 2023-11-28 09:33:22,569 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 11650, loss[loss=0.06268, simple_loss=0.08458, pruned_loss=0.01193, audio_tagging_loss=0.008461, over 15485.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.09066, pruned_loss=0.01247, audio_tagging_loss=0.008575, over 3051538.73 frames. ], batch size: 56, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:34:01,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3444540.0, ans=0.0 2023-11-28 09:34:17,095 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 516700 2023-11-28 09:34:20,381 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 11700, loss[loss=0.05246, simple_loss=0.06205, pruned_loss=0.01047, audio_tagging_loss=0.01096, over 15413.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.08957, pruned_loss=0.01225, audio_tagging_loss=0.008698, over 3045918.48 frames. ], batch size: 61, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:34:20,600 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3444673.3333333335, ans=0.0 2023-11-28 09:34:42,356 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.351e+01 8.763e+01 9.224e+01 1.034e+02 1.340e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-28 09:34:49,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3444806.6666666665, ans=0.0 2023-11-28 09:34:50,799 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.63 vs. limit=15.0 2023-11-28 09:34:55,954 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-28 09:35:07,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3444940.0, ans=0.125 2023-11-28 09:35:14,281 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 516750 2023-11-28 09:35:17,441 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 11750, loss[loss=0.05855, simple_loss=0.07863, pruned_loss=0.009691, audio_tagging_loss=0.00954, over 15163.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.0891, pruned_loss=0.01213, audio_tagging_loss=0.008709, over 3040122.68 frames. ], batch size: 57, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:35:24,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3445006.6666666665, ans=0.1 2023-11-28 09:35:32,377 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3445073.3333333335, ans=0.125 2023-11-28 09:35:37,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.out_whiten.whitening_limit, batch_count=3445073.3333333335, ans=8.0 2023-11-28 09:35:41,803 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.81 vs. limit=6.0 2023-11-28 09:35:42,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3445140.0, ans=0.125 2023-11-28 09:35:58,682 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.14 vs. limit=22.5 2023-11-28 09:35:59,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3445206.6666666665, ans=0.2 2023-11-28 09:36:08,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3445273.3333333335, ans=0.0 2023-11-28 09:36:09,932 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.61 vs. limit=15.0 2023-11-28 09:36:10,243 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 516800 2023-11-28 09:36:13,823 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.31 vs. limit=15.0 2023-11-28 09:36:14,231 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 11800, loss[loss=0.06973, simple_loss=0.09505, pruned_loss=0.01391, audio_tagging_loss=0.0083, over 15094.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.08964, pruned_loss=0.01222, audio_tagging_loss=0.008721, over 3039995.11 frames. ], batch size: 58, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:36:14,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3445340.0, ans=0.125 2023-11-28 09:36:25,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3445406.6666666665, ans=0.07 2023-11-28 09:36:26,777 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3445406.6666666665, ans=0.1 2023-11-28 09:36:37,282 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.517e+01 8.864e+01 9.665e+01 1.018e+02 1.283e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-28 09:36:38,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3445473.3333333335, ans=0.125 2023-11-28 09:36:45,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3445473.3333333335, ans=0.0 2023-11-28 09:37:06,049 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3445606.6666666665, ans=0.125 2023-11-28 09:37:06,416 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.38 vs. limit=15.0 2023-11-28 09:37:08,588 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 516850 2023-11-28 09:37:10,518 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3445606.6666666665, ans=0.0 2023-11-28 09:37:11,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3445673.3333333335, ans=0.125 2023-11-28 09:37:12,344 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 11850, loss[loss=0.04843, simple_loss=0.0593, pruned_loss=0.008796, audio_tagging_loss=0.009985, over 16344.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.09008, pruned_loss=0.01232, audio_tagging_loss=0.008754, over 3039559.67 frames. ], batch size: 63, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:37:31,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3445740.0, ans=0.0 2023-11-28 09:37:33,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3445806.6666666665, ans=0.2 2023-11-28 09:37:44,041 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3445806.6666666665, ans=0.125 2023-11-28 09:38:06,159 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 516900 2023-11-28 09:38:09,351 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 11900, loss[loss=0.0768, simple_loss=0.1117, pruned_loss=0.0124, audio_tagging_loss=0.008538, over 15327.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.08974, pruned_loss=0.0122, audio_tagging_loss=0.008904, over 3035509.00 frames. ], batch size: 55, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:38:32,383 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.420e+01 8.705e+01 9.389e+01 1.010e+02 1.284e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-28 09:38:57,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3446273.3333333335, ans=0.2 2023-11-28 09:39:00,252 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.45 vs. limit=10.0 2023-11-28 09:39:02,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3446273.3333333335, ans=0.125 2023-11-28 09:39:02,995 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 516950 2023-11-28 09:39:06,129 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 11950, loss[loss=0.06839, simple_loss=0.08852, pruned_loss=0.009837, audio_tagging_loss=0.01429, over 15293.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08844, pruned_loss=0.01219, audio_tagging_loss=0.009124, over 3034583.31 frames. ], batch size: 58, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:39:08,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3446340.0, ans=0.125 2023-11-28 09:39:45,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3446540.0, ans=0.0 2023-11-28 09:39:46,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3446540.0, ans=0.0 2023-11-28 09:39:53,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3446606.6666666665, ans=0.125 2023-11-28 09:39:58,681 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 517000 2023-11-28 09:40:01,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3446673.3333333335, ans=0.125 2023-11-28 09:40:01,995 INFO [train_asr.py:1235] (1/4) Epoch 43, batch 12000, loss[loss=0.07738, simple_loss=0.1013, pruned_loss=0.0201, audio_tagging_loss=0.006626, over 15526.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.08921, pruned_loss=0.01238, audio_tagging_loss=0.009179, over 3038632.07 frames. ], batch size: 57, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:40:01,996 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-28 09:40:16,661 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.2180, 3.9442, 3.4642, 3.8317], device='cuda:1') 2023-11-28 09:40:19,780 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.8772, 2.3864, 2.7047, 2.5347], device='cuda:1') 2023-11-28 09:40:31,998 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.9825, 3.9811, 4.8612, 4.4884], device='cuda:1') 2023-11-28 09:40:36,963 INFO [train_asr.py:1267] (1/4) Epoch 43, validation: loss=0.05826, simple_loss=0.05053, pruned_loss=0.005231, audio_tagging_loss=0.02777, over 4681554.00 frames. 2023-11-28 09:40:36,963 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-28 09:40:57,626 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.357e+01 8.981e+01 9.596e+01 1.044e+02 1.233e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-28 09:41:18,056 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 0, loss[loss=0.0687, simple_loss=0.06803, pruned_loss=0.01016, audio_tagging_loss=0.02453, over 15540.00 frames. ], tot_loss[loss=0.0687, simple_loss=0.06803, pruned_loss=0.01016, audio_tagging_loss=0.02453, over 15540.00 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 09:41:18,057 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-28 09:41:43,286 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.9862, 5.8738, 5.6632, 5.5667], device='cuda:1') 2023-11-28 09:41:48,358 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.8311, 5.8976, 5.9055, 5.9399], device='cuda:1') 2023-11-28 09:41:52,338 INFO [train_asr.py:1267] (1/4) Epoch 44, validation: loss=0.05791, simple_loss=0.05054, pruned_loss=0.00521, audio_tagging_loss=0.02743, over 4681554.00 frames. 2023-11-28 09:41:52,339 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-28 09:41:58,701 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3446840.0, ans=0.0 2023-11-28 09:42:19,012 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 517050 2023-11-28 09:42:37,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3447040.0, ans=0.125 2023-11-28 09:42:44,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3447106.6666666665, ans=0.1 2023-11-28 09:42:50,859 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 50, loss[loss=0.07286, simple_loss=0.08557, pruned_loss=0.01231, audio_tagging_loss=0.01777, over 14428.00 frames. ], tot_loss[loss=0.07494, simple_loss=0.08952, pruned_loss=0.0127, audio_tagging_loss=0.01748, over 694713.51 frames. ], batch size: 54, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 09:42:51,078 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3447173.3333333335, ans=0.125 2023-11-28 09:43:03,259 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3447240.0, ans=0.1 2023-11-28 09:43:08,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3447240.0, ans=0.125 2023-11-28 09:43:12,579 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3447240.0, ans=0.0 2023-11-28 09:43:14,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3447306.6666666665, ans=0.2 2023-11-28 09:43:16,888 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 517100 2023-11-28 09:43:24,516 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 09:43:27,121 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.61 vs. limit=22.5 2023-11-28 09:43:44,303 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.529e+01 9.824e+01 1.052e+02 1.128e+02 1.642e+02, threshold=2.105e+02, percent-clipped=0.0 2023-11-28 09:43:45,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3447440.0, ans=0.125 2023-11-28 09:43:50,424 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 100, loss[loss=0.08553, simple_loss=0.1096, pruned_loss=0.01601, audio_tagging_loss=0.01471, over 15868.00 frames. ], tot_loss[loss=0.07204, simple_loss=0.08764, pruned_loss=0.0116, audio_tagging_loss=0.01662, over 1213634.96 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 09:43:52,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3447506.6666666665, ans=0.125 2023-11-28 09:44:00,579 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3447573.3333333335, ans=0.125 2023-11-28 09:44:04,322 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.69 vs. limit=15.0 2023-11-28 09:44:09,994 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3447573.3333333335, ans=0.125 2023-11-28 09:44:11,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3447573.3333333335, ans=0.125 2023-11-28 09:44:12,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3447640.0, ans=0.0 2023-11-28 09:44:15,393 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 517150 2023-11-28 09:44:19,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3447640.0, ans=0.125 2023-11-28 09:44:22,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3447640.0, ans=0.125 2023-11-28 09:44:24,789 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3447706.6666666665, ans=0.125 2023-11-28 09:44:28,914 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3447706.6666666665, ans=0.125 2023-11-28 09:44:37,479 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.50 vs. limit=22.5 2023-11-28 09:44:45,927 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3447773.3333333335, ans=0.125 2023-11-28 09:44:47,937 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 150, loss[loss=0.09971, simple_loss=0.1415, pruned_loss=0.02321, audio_tagging_loss=0.005774, over 17131.00 frames. ], tot_loss[loss=0.0708, simple_loss=0.08863, pruned_loss=0.01185, audio_tagging_loss=0.01464, over 1626621.95 frames. ], batch size: 59, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 09:45:02,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3447906.6666666665, ans=0.1 2023-11-28 09:45:14,079 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 517200 2023-11-28 09:45:16,683 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3447973.3333333335, ans=0.0 2023-11-28 09:45:23,192 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.68 vs. limit=15.0 2023-11-28 09:45:38,120 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.00 vs. limit=15.0 2023-11-28 09:45:39,895 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3448106.6666666665, ans=0.125 2023-11-28 09:45:41,866 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.423e+01 9.000e+01 9.478e+01 1.042e+02 1.328e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-28 09:45:46,297 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 200, loss[loss=0.06526, simple_loss=0.09471, pruned_loss=0.01013, audio_tagging_loss=0.007779, over 15809.00 frames. ], tot_loss[loss=0.06908, simple_loss=0.08868, pruned_loss=0.01186, audio_tagging_loss=0.01288, over 1942213.42 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 09:46:11,955 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 517250 2023-11-28 09:46:23,505 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3448373.3333333335, ans=0.125 2023-11-28 09:46:29,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3448373.3333333335, ans=0.1 2023-11-28 09:46:30,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3448373.3333333335, ans=0.2 2023-11-28 09:46:43,891 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 250, loss[loss=0.06205, simple_loss=0.09171, pruned_loss=0.008026, audio_tagging_loss=0.008165, over 14959.00 frames. ], tot_loss[loss=0.06886, simple_loss=0.08976, pruned_loss=0.01235, audio_tagging_loss=0.01163, over 2185999.78 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 09:47:03,506 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.07 vs. limit=10.0 2023-11-28 09:47:09,240 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 517300 2023-11-28 09:47:13,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3448640.0, ans=0.025 2023-11-28 09:47:35,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3448773.3333333335, ans=0.125 2023-11-28 09:47:36,527 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.779e+01 9.287e+01 9.816e+01 1.058e+02 1.436e+02, threshold=1.963e+02, percent-clipped=0.0 2023-11-28 09:47:41,539 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 300, loss[loss=0.07379, simple_loss=0.09578, pruned_loss=0.01563, audio_tagging_loss=0.01028, over 15141.00 frames. ], tot_loss[loss=0.06897, simple_loss=0.09133, pruned_loss=0.01258, audio_tagging_loss=0.01073, over 2377629.88 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 09:47:46,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3448840.0, ans=0.1 2023-11-28 09:47:50,682 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.54 vs. limit=10.0 2023-11-28 09:47:54,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3448906.6666666665, ans=0.0 2023-11-28 09:48:07,246 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 517350 2023-11-28 09:48:39,233 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 350, loss[loss=0.05474, simple_loss=0.07164, pruned_loss=0.009016, audio_tagging_loss=0.009903, over 16170.00 frames. ], tot_loss[loss=0.06853, simple_loss=0.09157, pruned_loss=0.0125, audio_tagging_loss=0.01024, over 2534294.09 frames. ], batch size: 64, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 09:48:45,927 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3449173.3333333335, ans=0.1 2023-11-28 09:48:51,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3449240.0, ans=0.125 2023-11-28 09:48:52,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3449240.0, ans=0.0 2023-11-28 09:48:55,476 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3449240.0, ans=0.125 2023-11-28 09:49:04,305 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 517400 2023-11-28 09:49:32,631 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.240e+01 9.082e+01 9.709e+01 1.033e+02 1.269e+02, threshold=1.942e+02, percent-clipped=0.0 2023-11-28 09:49:35,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3449440.0, ans=0.1 2023-11-28 09:49:37,634 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 400, loss[loss=0.06233, simple_loss=0.08444, pruned_loss=0.01013, audio_tagging_loss=0.009976, over 15941.00 frames. ], tot_loss[loss=0.06762, simple_loss=0.09067, pruned_loss=0.01237, audio_tagging_loss=0.00991, over 2648155.39 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 09:50:03,341 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 517450 2023-11-28 09:50:13,610 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.96 vs. limit=15.0 2023-11-28 09:50:14,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3449706.6666666665, ans=0.125 2023-11-28 09:50:22,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3449773.3333333335, ans=0.125 2023-11-28 09:50:25,144 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3449773.3333333335, ans=0.125 2023-11-28 09:50:34,881 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 450, loss[loss=0.07261, simple_loss=0.1013, pruned_loss=0.01148, audio_tagging_loss=0.0105, over 14803.00 frames. ], tot_loss[loss=0.06731, simple_loss=0.09089, pruned_loss=0.01226, audio_tagging_loss=0.009609, over 2740886.29 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 09:50:54,556 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.84 vs. limit=8.0 2023-11-28 09:51:00,759 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 517500 2023-11-28 09:51:06,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3449973.3333333335, ans=0.125 2023-11-28 09:51:06,888 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.94 vs. limit=12.0 2023-11-28 09:51:29,432 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.785e+01 8.576e+01 9.362e+01 1.011e+02 1.317e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-28 09:51:32,743 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 500, loss[loss=0.06134, simple_loss=0.0843, pruned_loss=0.01077, audio_tagging_loss=0.008416, over 15410.00 frames. ], tot_loss[loss=0.06673, simple_loss=0.09044, pruned_loss=0.01224, audio_tagging_loss=0.00927, over 2807741.68 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 09:51:32,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3450173.3333333335, ans=0.2 2023-11-28 09:51:36,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3450173.3333333335, ans=0.1 2023-11-28 09:51:50,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3450240.0, ans=0.05 2023-11-28 09:51:58,191 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 517550 2023-11-28 09:52:07,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3450373.3333333335, ans=0.125 2023-11-28 09:52:30,028 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 550, loss[loss=0.06545, simple_loss=0.09247, pruned_loss=0.01206, audio_tagging_loss=0.007156, over 15299.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.0892, pruned_loss=0.01202, audio_tagging_loss=0.009085, over 2861934.05 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 09:52:39,977 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=7.98 vs. limit=12.0 2023-11-28 09:52:42,101 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.98 vs. limit=15.0 2023-11-28 09:52:55,467 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 517600 2023-11-28 09:52:56,605 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.70 vs. limit=22.5 2023-11-28 09:53:24,178 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.276e+01 8.868e+01 9.461e+01 1.003e+02 1.214e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-28 09:53:27,500 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 600, loss[loss=0.06738, simple_loss=0.09075, pruned_loss=0.01352, audio_tagging_loss=0.008481, over 15067.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.08925, pruned_loss=0.0121, audio_tagging_loss=0.009105, over 2900750.80 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 09:53:31,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3450840.0, ans=0.125 2023-11-28 09:53:41,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3450906.6666666665, ans=0.5 2023-11-28 09:53:42,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3450906.6666666665, ans=0.125 2023-11-28 09:53:53,167 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 517650 2023-11-28 09:53:59,915 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3450973.3333333335, ans=0.125 2023-11-28 09:54:05,852 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.52 vs. limit=15.0 2023-11-28 09:54:17,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3451106.6666666665, ans=0.125 2023-11-28 09:54:20,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3451106.6666666665, ans=0.95 2023-11-28 09:54:25,030 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 650, loss[loss=0.08435, simple_loss=0.1247, pruned_loss=0.01512, audio_tagging_loss=0.006895, over 15761.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.0896, pruned_loss=0.01233, audio_tagging_loss=0.009044, over 2934565.39 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 09:54:37,196 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3451240.0, ans=0.2 2023-11-28 09:54:41,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3451240.0, ans=0.125 2023-11-28 09:54:50,120 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 517700 2023-11-28 09:55:15,424 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.20 vs. limit=15.0 2023-11-28 09:55:17,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3451440.0, ans=0.1 2023-11-28 09:55:18,039 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.356e+01 9.000e+01 9.495e+01 1.012e+02 1.235e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-28 09:55:21,740 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 700, loss[loss=0.08823, simple_loss=0.1267, pruned_loss=0.0185, audio_tagging_loss=0.006349, over 15883.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.08946, pruned_loss=0.01217, audio_tagging_loss=0.009013, over 2960690.02 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 09:55:30,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3451506.6666666665, ans=0.0 2023-11-28 09:55:46,369 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 517750 2023-11-28 09:55:52,969 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3451640.0, ans=0.125 2023-11-28 09:55:54,341 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.45 vs. limit=15.0 2023-11-28 09:55:58,785 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.54 vs. limit=22.5 2023-11-28 09:56:18,706 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 750, loss[loss=0.07935, simple_loss=0.1158, pruned_loss=0.01374, audio_tagging_loss=0.007697, over 15487.00 frames. ], tot_loss[loss=0.06657, simple_loss=0.0905, pruned_loss=0.01244, audio_tagging_loss=0.008885, over 2978353.42 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 09:56:20,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3451840.0, ans=0.125 2023-11-28 09:56:28,062 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.05 vs. limit=12.0 2023-11-28 09:56:34,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3451906.6666666665, ans=0.0 2023-11-28 09:56:44,436 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 517800 2023-11-28 09:57:13,224 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.571e+01 8.892e+01 9.576e+01 1.074e+02 1.448e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-28 09:57:13,473 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3452106.6666666665, ans=0.0 2023-11-28 09:57:14,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3452106.6666666665, ans=0.125 2023-11-28 09:57:16,375 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 800, loss[loss=0.07063, simple_loss=0.09745, pruned_loss=0.01225, audio_tagging_loss=0.00966, over 15946.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.09052, pruned_loss=0.01244, audio_tagging_loss=0.008914, over 2995208.40 frames. ], batch size: 60, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 09:57:23,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3452173.3333333335, ans=0.0 2023-11-28 09:57:25,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3452173.3333333335, ans=0.0 2023-11-28 09:57:28,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3452240.0, ans=0.125 2023-11-28 09:57:42,716 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 517850 2023-11-28 09:57:43,388 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.11 vs. limit=15.0 2023-11-28 09:57:47,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3452306.6666666665, ans=0.0 2023-11-28 09:57:51,636 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3452373.3333333335, ans=0.2 2023-11-28 09:58:14,589 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 850, loss[loss=0.03568, simple_loss=0.03646, pruned_loss=0.00384, audio_tagging_loss=0.01361, over 14404.00 frames. ], tot_loss[loss=0.06649, simple_loss=0.09008, pruned_loss=0.01245, audio_tagging_loss=0.008998, over 3009353.15 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 09:58:14,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=3452506.6666666665, ans=15.0 2023-11-28 09:58:27,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3452573.3333333335, ans=0.2 2023-11-28 09:58:30,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3452573.3333333335, ans=0.1 2023-11-28 09:58:40,034 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 517900 2023-11-28 09:59:10,925 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.582e+01 8.934e+01 9.404e+01 1.018e+02 1.329e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-28 09:59:13,137 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 900, loss[loss=0.04734, simple_loss=0.06886, pruned_loss=0.006679, audio_tagging_loss=0.006232, over 13830.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.09029, pruned_loss=0.01234, audio_tagging_loss=0.009022, over 3014582.24 frames. ], batch size: 52, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 09:59:31,241 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3452906.6666666665, ans=0.0 2023-11-28 09:59:37,873 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 517950 2023-11-28 09:59:49,440 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.99 vs. limit=15.0 2023-11-28 09:59:56,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3453040.0, ans=10.0 2023-11-28 09:59:58,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3453106.6666666665, ans=0.0 2023-11-28 10:00:03,640 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.34 vs. limit=10.0 2023-11-28 10:00:03,865 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.87 vs. limit=15.0 2023-11-28 10:00:09,551 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 950, loss[loss=0.059, simple_loss=0.0741, pruned_loss=0.01282, audio_tagging_loss=0.009133, over 16488.00 frames. ], tot_loss[loss=0.06645, simple_loss=0.09041, pruned_loss=0.01233, audio_tagging_loss=0.008918, over 3022119.09 frames. ], batch size: 60, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 10:00:12,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3453173.3333333335, ans=0.125 2023-11-28 10:00:26,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3453240.0, ans=0.0 2023-11-28 10:00:26,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3453240.0, ans=0.1 2023-11-28 10:00:35,441 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 518000 2023-11-28 10:00:45,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3453373.3333333335, ans=0.1 2023-11-28 10:00:59,520 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3453440.0, ans=0.0 2023-11-28 10:01:03,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3453440.0, ans=0.0 2023-11-28 10:01:05,937 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.727e+01 8.698e+01 9.447e+01 1.001e+02 1.435e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-28 10:01:07,047 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 1000, loss[loss=0.08311, simple_loss=0.1187, pruned_loss=0.0166, audio_tagging_loss=0.00715, over 16038.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.08998, pruned_loss=0.0122, audio_tagging_loss=0.008889, over 3026544.07 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 10:01:08,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3453506.6666666665, ans=0.09899494936611666 2023-11-28 10:01:14,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3453506.6666666665, ans=0.125 2023-11-28 10:01:18,636 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.31 vs. limit=12.0 2023-11-28 10:01:21,981 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.03 vs. limit=22.5 2023-11-28 10:01:30,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3453640.0, ans=0.0 2023-11-28 10:01:30,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3453640.0, ans=0.07 2023-11-28 10:01:32,625 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 518050 2023-11-28 10:01:33,735 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 10:01:40,546 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.62 vs. limit=15.0 2023-11-28 10:01:53,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3453773.3333333335, ans=0.125 2023-11-28 10:01:56,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3453773.3333333335, ans=0.125 2023-11-28 10:02:05,485 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 1050, loss[loss=0.07689, simple_loss=0.101, pruned_loss=0.01681, audio_tagging_loss=0.00957, over 15603.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.09011, pruned_loss=0.01225, audio_tagging_loss=0.008745, over 3036159.10 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 10:02:10,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3453840.0, ans=0.1 2023-11-28 10:02:13,781 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.90 vs. limit=10.0 2023-11-28 10:02:24,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3453906.6666666665, ans=0.125 2023-11-28 10:02:30,876 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 518100 2023-11-28 10:02:37,913 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.34 vs. limit=22.5 2023-11-28 10:02:56,733 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.50 vs. limit=15.0 2023-11-28 10:03:01,561 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.069e+01 8.979e+01 9.409e+01 9.986e+01 1.298e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-28 10:03:02,675 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 1100, loss[loss=0.06294, simple_loss=0.08486, pruned_loss=0.009813, audio_tagging_loss=0.0107, over 15631.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.09002, pruned_loss=0.01231, audio_tagging_loss=0.008625, over 3051820.97 frames. ], batch size: 59, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 10:03:08,620 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 10:03:10,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3454173.3333333335, ans=0.2 2023-11-28 10:03:14,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3454240.0, ans=0.0 2023-11-28 10:03:15,637 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.52 vs. limit=15.0 2023-11-28 10:03:24,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3454306.6666666665, ans=0.1 2023-11-28 10:03:28,439 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 518150 2023-11-28 10:03:31,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3454306.6666666665, ans=0.0 2023-11-28 10:03:35,945 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3454373.3333333335, ans=0.125 2023-11-28 10:03:35,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3454373.3333333335, ans=0.05 2023-11-28 10:03:53,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3454440.0, ans=0.2 2023-11-28 10:03:59,647 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 1150, loss[loss=0.05204, simple_loss=0.06726, pruned_loss=0.01005, audio_tagging_loss=0.008356, over 14865.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.08955, pruned_loss=0.01233, audio_tagging_loss=0.00865, over 3047005.05 frames. ], batch size: 59, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 10:04:11,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3454573.3333333335, ans=0.125 2023-11-28 10:04:24,282 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.34 vs. limit=15.0 2023-11-28 10:04:24,960 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 518200 2023-11-28 10:04:36,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3454706.6666666665, ans=0.1 2023-11-28 10:04:39,036 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3454706.6666666665, ans=0.0 2023-11-28 10:04:40,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3454706.6666666665, ans=0.0 2023-11-28 10:04:49,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3454773.3333333335, ans=0.2 2023-11-28 10:04:57,109 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.756e+01 8.839e+01 9.353e+01 1.036e+02 1.275e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-28 10:04:57,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3454840.0, ans=0.125 2023-11-28 10:04:58,225 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 1200, loss[loss=0.06921, simple_loss=0.09909, pruned_loss=0.01049, audio_tagging_loss=0.009178, over 15405.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.09023, pruned_loss=0.01238, audio_tagging_loss=0.008536, over 3045267.45 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:05:02,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3454840.0, ans=0.0 2023-11-28 10:05:05,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3454840.0, ans=0.125 2023-11-28 10:05:22,762 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 518250 2023-11-28 10:05:29,664 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.45 vs. limit=15.0 2023-11-28 10:05:31,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3455040.0, ans=0.95 2023-11-28 10:05:34,057 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3455040.0, ans=0.125 2023-11-28 10:05:54,772 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 1250, loss[loss=0.06613, simple_loss=0.08608, pruned_loss=0.01511, audio_tagging_loss=0.007985, over 16294.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.08981, pruned_loss=0.01226, audio_tagging_loss=0.008573, over 3043112.50 frames. ], batch size: 62, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:06:20,706 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 518300 2023-11-28 10:06:27,075 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.62 vs. limit=12.0 2023-11-28 10:06:31,381 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.19 vs. limit=15.0 2023-11-28 10:06:40,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3455440.0, ans=0.125 2023-11-28 10:06:42,035 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3455440.0, ans=0.0 2023-11-28 10:06:42,488 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.49 vs. limit=15.0 2023-11-28 10:06:50,829 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.624e+01 8.649e+01 9.225e+01 9.865e+01 1.174e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-28 10:06:51,957 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 1300, loss[loss=0.06595, simple_loss=0.08925, pruned_loss=0.0137, audio_tagging_loss=0.007628, over 16420.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.08959, pruned_loss=0.0122, audio_tagging_loss=0.00848, over 3041648.69 frames. ], batch size: 63, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:07:07,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3455573.3333333335, ans=0.2 2023-11-28 10:07:17,150 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 518350 2023-11-28 10:07:38,743 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3455773.3333333335, ans=0.125 2023-11-28 10:07:42,041 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3455773.3333333335, ans=0.125 2023-11-28 10:07:46,412 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.88 vs. limit=6.0 2023-11-28 10:07:49,284 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 1350, loss[loss=0.06359, simple_loss=0.08021, pruned_loss=0.01423, audio_tagging_loss=0.009259, over 14756.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.08964, pruned_loss=0.0121, audio_tagging_loss=0.008557, over 3047021.33 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:07:57,548 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3455840.0, ans=0.0 2023-11-28 10:08:06,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3455906.6666666665, ans=0.125 2023-11-28 10:08:08,534 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.25 vs. limit=12.0 2023-11-28 10:08:14,037 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 518400 2023-11-28 10:08:24,218 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3456040.0, ans=0.0 2023-11-28 10:08:29,007 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3456040.0, ans=0.035 2023-11-28 10:08:33,613 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 10:08:35,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3456106.6666666665, ans=0.125 2023-11-28 10:08:45,078 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.326e+01 8.591e+01 9.504e+01 1.020e+02 1.211e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-28 10:08:46,237 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 1400, loss[loss=0.05089, simple_loss=0.06741, pruned_loss=0.008314, audio_tagging_loss=0.008871, over 15354.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.08975, pruned_loss=0.01226, audio_tagging_loss=0.008645, over 3048530.58 frames. ], batch size: 60, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:09:11,870 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 518450 2023-11-28 10:09:22,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3456373.3333333335, ans=0.1 2023-11-28 10:09:34,476 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3456440.0, ans=0.125 2023-11-28 10:09:43,530 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 1450, loss[loss=0.07044, simple_loss=0.09536, pruned_loss=0.01469, audio_tagging_loss=0.008063, over 14017.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.08968, pruned_loss=0.01221, audio_tagging_loss=0.00867, over 3056012.18 frames. ], batch size: 53, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:09:47,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3456506.6666666665, ans=0.5 2023-11-28 10:09:51,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3456506.6666666665, ans=0.0 2023-11-28 10:10:03,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3456573.3333333335, ans=0.125 2023-11-28 10:10:06,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3456640.0, ans=0.0 2023-11-28 10:10:08,639 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 518500 2023-11-28 10:10:16,045 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3456640.0, ans=0.125 2023-11-28 10:10:38,476 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3456773.3333333335, ans=0.125 2023-11-28 10:10:39,657 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.497e+01 8.920e+01 9.408e+01 1.027e+02 1.400e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-28 10:10:41,240 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 1500, loss[loss=0.0657, simple_loss=0.08525, pruned_loss=0.01372, audio_tagging_loss=0.009357, over 16746.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.09023, pruned_loss=0.0123, audio_tagging_loss=0.008715, over 3056192.85 frames. ], batch size: 62, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:10:48,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3456840.0, ans=0.125 2023-11-28 10:11:06,407 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 518550 2023-11-28 10:11:19,866 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3457040.0, ans=0.125 2023-11-28 10:11:33,815 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3457106.6666666665, ans=0.2 2023-11-28 10:11:37,957 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 1550, loss[loss=0.09059, simple_loss=0.1348, pruned_loss=0.01675, audio_tagging_loss=0.006436, over 15513.00 frames. ], tot_loss[loss=0.06595, simple_loss=0.08983, pruned_loss=0.01223, audio_tagging_loss=0.0088, over 3054888.62 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:11:40,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3457173.3333333335, ans=0.1 2023-11-28 10:11:43,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3457173.3333333335, ans=0.125 2023-11-28 10:11:55,822 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.22 vs. limit=22.5 2023-11-28 10:12:03,104 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 518600 2023-11-28 10:12:12,172 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3457373.3333333335, ans=0.125 2023-11-28 10:12:30,365 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3457440.0, ans=0.0 2023-11-28 10:12:35,069 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.552e+01 8.956e+01 9.382e+01 1.022e+02 1.472e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-28 10:12:36,222 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 1600, loss[loss=0.045, simple_loss=0.05257, pruned_loss=0.005807, audio_tagging_loss=0.01291, over 14878.00 frames. ], tot_loss[loss=0.06602, simple_loss=0.08986, pruned_loss=0.01223, audio_tagging_loss=0.008864, over 3052141.38 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 10:12:38,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3457506.6666666665, ans=0.1 2023-11-28 10:12:55,291 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.32 vs. limit=22.5 2023-11-28 10:13:00,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3457640.0, ans=0.1 2023-11-28 10:13:01,284 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 518650 2023-11-28 10:13:01,712 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.66 vs. limit=6.0 2023-11-28 10:13:21,502 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.24 vs. limit=15.0 2023-11-28 10:13:22,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=3457773.3333333335, ans=0.1 2023-11-28 10:13:25,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3457773.3333333335, ans=0.125 2023-11-28 10:13:33,626 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 1650, loss[loss=0.06399, simple_loss=0.08517, pruned_loss=0.01335, audio_tagging_loss=0.008052, over 15197.00 frames. ], tot_loss[loss=0.06595, simple_loss=0.08968, pruned_loss=0.01221, audio_tagging_loss=0.008909, over 3053177.07 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 10:13:58,874 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 518700 2023-11-28 10:14:04,035 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3457973.3333333335, ans=0.125 2023-11-28 10:14:18,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=3458040.0, ans=10.0 2023-11-28 10:14:23,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3458106.6666666665, ans=0.0 2023-11-28 10:14:30,053 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.597e+01 8.751e+01 9.360e+01 1.005e+02 1.461e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-28 10:14:31,138 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 1700, loss[loss=0.05467, simple_loss=0.07714, pruned_loss=0.00677, audio_tagging_loss=0.009332, over 13748.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08934, pruned_loss=0.0121, audio_tagging_loss=0.008917, over 3050782.17 frames. ], batch size: 52, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 10:14:41,359 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3458240.0, ans=0.0 2023-11-28 10:14:47,866 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3458240.0, ans=0.125 2023-11-28 10:14:56,412 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 518750 2023-11-28 10:15:04,723 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.70 vs. limit=15.0 2023-11-28 10:15:21,167 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.20 vs. limit=22.5 2023-11-28 10:15:28,846 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 1750, loss[loss=0.05008, simple_loss=0.0711, pruned_loss=0.005415, audio_tagging_loss=0.009111, over 14535.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.08916, pruned_loss=0.01205, audio_tagging_loss=0.008874, over 3045464.92 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:15:36,045 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.52 vs. limit=12.0 2023-11-28 10:15:37,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3458506.6666666665, ans=0.09899494936611666 2023-11-28 10:15:40,158 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3458573.3333333335, ans=0.5 2023-11-28 10:15:51,246 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.14 vs. limit=15.0 2023-11-28 10:15:54,045 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 518800 2023-11-28 10:16:01,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3458706.6666666665, ans=0.1 2023-11-28 10:16:02,041 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3458706.6666666665, ans=0.0 2023-11-28 10:16:13,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3458773.3333333335, ans=0.125 2023-11-28 10:16:13,746 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3458773.3333333335, ans=0.0 2023-11-28 10:16:25,431 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.270e+01 8.578e+01 9.174e+01 9.766e+01 1.256e+02, threshold=1.835e+02, percent-clipped=0.0 2023-11-28 10:16:25,459 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 1800, loss[loss=0.07551, simple_loss=0.1026, pruned_loss=0.01802, audio_tagging_loss=0.0062, over 14889.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08956, pruned_loss=0.01205, audio_tagging_loss=0.00871, over 3049725.36 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:16:33,212 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3458840.0, ans=0.0 2023-11-28 10:16:49,609 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3458973.3333333335, ans=0.2 2023-11-28 10:16:50,460 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 518850 2023-11-28 10:16:55,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3458973.3333333335, ans=0.125 2023-11-28 10:16:59,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3459040.0, ans=0.025 2023-11-28 10:17:23,172 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 1850, loss[loss=0.07218, simple_loss=0.0991, pruned_loss=0.01671, audio_tagging_loss=0.005916, over 14729.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.08926, pruned_loss=0.01216, audio_tagging_loss=0.008628, over 3049729.39 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:17:43,674 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3459240.0, ans=0.125 2023-11-28 10:17:43,727 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3459240.0, ans=0.0 2023-11-28 10:17:45,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3459306.6666666665, ans=0.125 2023-11-28 10:17:47,776 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 518900 2023-11-28 10:18:00,119 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 10:18:14,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3459440.0, ans=0.125 2023-11-28 10:18:19,434 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.532e+01 8.665e+01 9.197e+01 1.005e+02 1.247e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-28 10:18:19,473 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 1900, loss[loss=0.07632, simple_loss=0.1064, pruned_loss=0.01385, audio_tagging_loss=0.009254, over 15193.00 frames. ], tot_loss[loss=0.06505, simple_loss=0.08878, pruned_loss=0.01205, audio_tagging_loss=0.008605, over 3054241.93 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:18:34,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3459573.3333333335, ans=0.2 2023-11-28 10:18:45,665 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 518950 2023-11-28 10:18:51,511 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.36 vs. limit=10.0 2023-11-28 10:18:52,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3459640.0, ans=0.04949747468305833 2023-11-28 10:18:52,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3459640.0, ans=0.125 2023-11-28 10:18:58,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3459706.6666666665, ans=0.2 2023-11-28 10:19:11,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3459773.3333333335, ans=0.2 2023-11-28 10:19:16,891 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 1950, loss[loss=0.05882, simple_loss=0.07327, pruned_loss=0.01339, audio_tagging_loss=0.008797, over 14850.00 frames. ], tot_loss[loss=0.06506, simple_loss=0.08879, pruned_loss=0.01209, audio_tagging_loss=0.008569, over 3060352.25 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:19:18,440 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.10 vs. limit=12.0 2023-11-28 10:19:25,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3459840.0, ans=0.2 2023-11-28 10:19:30,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3459906.6666666665, ans=0.125 2023-11-28 10:19:41,700 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 519000 2023-11-28 10:19:41,838 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3459973.3333333335, ans=0.05 2023-11-28 10:19:54,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3460040.0, ans=0.125 2023-11-28 10:20:01,837 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.80 vs. limit=10.0 2023-11-28 10:20:06,356 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.14 vs. limit=15.0 2023-11-28 10:20:07,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3460106.6666666665, ans=0.2 2023-11-28 10:20:09,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3460106.6666666665, ans=0.125 2023-11-28 10:20:14,532 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.174e+01 8.984e+01 9.500e+01 1.035e+02 1.289e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-28 10:20:14,559 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 2000, loss[loss=0.04605, simple_loss=0.05671, pruned_loss=0.008043, audio_tagging_loss=0.009648, over 15409.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.08869, pruned_loss=0.01213, audio_tagging_loss=0.008697, over 3060255.62 frames. ], batch size: 60, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 10:20:17,012 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3460173.3333333335, ans=0.0 2023-11-28 10:20:18,713 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.31 vs. limit=15.0 2023-11-28 10:20:37,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3460306.6666666665, ans=0.125 2023-11-28 10:20:39,480 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 519050 2023-11-28 10:20:57,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3460373.3333333335, ans=0.125 2023-11-28 10:21:07,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3460440.0, ans=0.125 2023-11-28 10:21:08,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3460440.0, ans=0.125 2023-11-28 10:21:08,346 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3460440.0, ans=0.1 2023-11-28 10:21:09,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3460440.0, ans=0.1 2023-11-28 10:21:11,328 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 2050, loss[loss=0.05374, simple_loss=0.07332, pruned_loss=0.007343, audio_tagging_loss=0.00974, over 14575.00 frames. ], tot_loss[loss=0.06515, simple_loss=0.08881, pruned_loss=0.01213, audio_tagging_loss=0.008616, over 3056804.73 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 10:21:12,743 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3460506.6666666665, ans=0.2 2023-11-28 10:21:38,296 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 519100 2023-11-28 10:21:45,017 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3460640.0, ans=0.125 2023-11-28 10:21:55,028 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3460706.6666666665, ans=0.125 2023-11-28 10:22:09,705 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 2100, loss[loss=0.06809, simple_loss=0.09529, pruned_loss=0.01195, audio_tagging_loss=0.008497, over 15282.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.08926, pruned_loss=0.01228, audio_tagging_loss=0.00851, over 3056959.74 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:22:10,761 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.712e+01 8.721e+01 9.366e+01 1.002e+02 1.628e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-28 10:22:27,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3460906.6666666665, ans=0.125 2023-11-28 10:22:35,476 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 519150 2023-11-28 10:23:00,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3461106.6666666665, ans=0.04949747468305833 2023-11-28 10:23:03,645 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3461106.6666666665, ans=0.0 2023-11-28 10:23:04,961 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.94 vs. limit=22.5 2023-11-28 10:23:05,720 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3461106.6666666665, ans=0.0 2023-11-28 10:23:08,718 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 2150, loss[loss=0.06008, simple_loss=0.0853, pruned_loss=0.01132, audio_tagging_loss=0.006112, over 15340.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.08909, pruned_loss=0.01226, audio_tagging_loss=0.008599, over 3055209.72 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:23:08,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3461173.3333333335, ans=0.0 2023-11-28 10:23:11,667 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.04 vs. limit=22.5 2023-11-28 10:23:16,025 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.12 vs. limit=15.0 2023-11-28 10:23:20,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3461240.0, ans=0.0 2023-11-28 10:23:22,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3461240.0, ans=0.2 2023-11-28 10:23:33,960 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 519200 2023-11-28 10:23:34,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3461306.6666666665, ans=0.125 2023-11-28 10:23:40,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3461306.6666666665, ans=0.125 2023-11-28 10:23:48,073 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 10:23:56,088 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3461440.0, ans=0.0 2023-11-28 10:24:05,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3461440.0, ans=0.0 2023-11-28 10:24:07,079 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 2200, loss[loss=0.04366, simple_loss=0.05127, pruned_loss=0.007351, audio_tagging_loss=0.01068, over 14026.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.08939, pruned_loss=0.01229, audio_tagging_loss=0.008586, over 3048109.50 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:24:08,087 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.398e+01 8.940e+01 9.417e+01 1.003e+02 1.474e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-28 10:24:10,458 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3461506.6666666665, ans=0.0 2023-11-28 10:24:17,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3461573.3333333335, ans=0.1 2023-11-28 10:24:23,742 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.95 vs. limit=22.5 2023-11-28 10:24:27,894 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3461573.3333333335, ans=0.2 2023-11-28 10:24:33,006 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 519250 2023-11-28 10:24:45,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=3461706.6666666665, ans=10.0 2023-11-28 10:25:03,637 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.89 vs. limit=15.0 2023-11-28 10:25:04,101 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 2250, loss[loss=0.0525, simple_loss=0.0662, pruned_loss=0.00953, audio_tagging_loss=0.009871, over 14727.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.08921, pruned_loss=0.01234, audio_tagging_loss=0.008652, over 3051690.88 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:25:16,726 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3461906.6666666665, ans=0.1 2023-11-28 10:25:28,167 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.08 vs. limit=15.0 2023-11-28 10:25:29,782 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 519300 2023-11-28 10:25:29,948 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3461973.3333333335, ans=0.1 2023-11-28 10:25:32,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3461973.3333333335, ans=0.1 2023-11-28 10:25:41,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3462040.0, ans=0.125 2023-11-28 10:25:45,518 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.20 vs. limit=15.0 2023-11-28 10:26:02,505 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.56 vs. limit=6.0 2023-11-28 10:26:02,942 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 2300, loss[loss=0.04646, simple_loss=0.05739, pruned_loss=0.007146, audio_tagging_loss=0.01062, over 14349.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.08945, pruned_loss=0.01249, audio_tagging_loss=0.008682, over 3045215.14 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:26:04,012 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.619e+01 8.792e+01 9.298e+01 1.006e+02 1.302e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-28 10:26:28,163 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 519350 2023-11-28 10:26:32,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3462306.6666666665, ans=0.0 2023-11-28 10:26:33,688 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3462306.6666666665, ans=0.0 2023-11-28 10:26:37,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3462373.3333333335, ans=0.1 2023-11-28 10:26:39,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3462373.3333333335, ans=0.1 2023-11-28 10:26:52,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3462440.0, ans=0.1 2023-11-28 10:26:53,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3462440.0, ans=0.125 2023-11-28 10:26:53,363 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.81 vs. limit=10.0 2023-11-28 10:26:56,170 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 10:27:00,539 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 2350, loss[loss=0.07056, simple_loss=0.0838, pruned_loss=0.01836, audio_tagging_loss=0.0103, over 14939.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.08985, pruned_loss=0.01253, audio_tagging_loss=0.008691, over 3042153.07 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:27:02,998 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3462506.6666666665, ans=0.125 2023-11-28 10:27:13,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3462573.3333333335, ans=0.125 2023-11-28 10:27:22,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3462640.0, ans=0.125 2023-11-28 10:27:25,747 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 519400 2023-11-28 10:27:27,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3462640.0, ans=0.04949747468305833 2023-11-28 10:27:29,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3462640.0, ans=0.2 2023-11-28 10:27:48,600 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3462773.3333333335, ans=0.0 2023-11-28 10:27:59,278 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 2400, loss[loss=0.06053, simple_loss=0.08642, pruned_loss=0.009366, audio_tagging_loss=0.007951, over 15327.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.09025, pruned_loss=0.01247, audio_tagging_loss=0.008782, over 3044715.27 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 10:28:00,337 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.486e+01 8.676e+01 9.385e+01 1.010e+02 1.342e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-28 10:28:06,365 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3462840.0, ans=0.125 2023-11-28 10:28:08,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3462840.0, ans=0.035 2023-11-28 10:28:24,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3462973.3333333335, ans=0.125 2023-11-28 10:28:25,727 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 519450 2023-11-28 10:28:29,394 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3462973.3333333335, ans=0.125 2023-11-28 10:28:44,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3463040.0, ans=0.1 2023-11-28 10:28:58,258 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 2450, loss[loss=0.05767, simple_loss=0.08132, pruned_loss=0.008461, audio_tagging_loss=0.008547, over 15896.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08928, pruned_loss=0.01228, audio_tagging_loss=0.008858, over 3040989.59 frames. ], batch size: 62, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 10:29:23,763 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 519500 2023-11-28 10:29:46,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3463440.0, ans=0.125 2023-11-28 10:29:50,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3463440.0, ans=0.125 2023-11-28 10:29:56,353 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 2500, loss[loss=0.08114, simple_loss=0.1182, pruned_loss=0.01552, audio_tagging_loss=0.006532, over 15347.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.08966, pruned_loss=0.01236, audio_tagging_loss=0.00889, over 3042171.44 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 10:29:57,383 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.145e+01 8.648e+01 9.240e+01 1.001e+02 1.352e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-28 10:30:12,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3463573.3333333335, ans=0.125 2023-11-28 10:30:21,352 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 519550 2023-11-28 10:30:54,580 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 2550, loss[loss=0.0541, simple_loss=0.06665, pruned_loss=0.007283, audio_tagging_loss=0.01349, over 15459.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.08907, pruned_loss=0.0123, audio_tagging_loss=0.00883, over 3043981.72 frames. ], batch size: 62, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:31:06,197 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.08 vs. limit=22.5 2023-11-28 10:31:07,072 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3463906.6666666665, ans=0.2 2023-11-28 10:31:19,995 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 519600 2023-11-28 10:31:24,090 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3463973.3333333335, ans=0.125 2023-11-28 10:31:38,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3464040.0, ans=0.0 2023-11-28 10:31:53,572 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 2600, loss[loss=0.06081, simple_loss=0.08162, pruned_loss=0.0115, audio_tagging_loss=0.008504, over 16030.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08925, pruned_loss=0.01236, audio_tagging_loss=0.00868, over 3039550.47 frames. ], batch size: 61, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:31:56,362 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.281e+01 8.673e+01 9.368e+01 9.896e+01 1.178e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-28 10:32:03,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3464173.3333333335, ans=0.125 2023-11-28 10:32:19,476 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 519650 2023-11-28 10:32:45,072 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.65 vs. limit=15.0 2023-11-28 10:32:52,187 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 2650, loss[loss=0.07821, simple_loss=0.1087, pruned_loss=0.015, audio_tagging_loss=0.008845, over 16188.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.08987, pruned_loss=0.01228, audio_tagging_loss=0.008605, over 3048536.34 frames. ], batch size: 59, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:32:54,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3464506.6666666665, ans=0.2 2023-11-28 10:33:02,476 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.86 vs. limit=15.0 2023-11-28 10:33:07,397 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.67 vs. limit=15.0 2023-11-28 10:33:12,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3464573.3333333335, ans=0.0 2023-11-28 10:33:14,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3464640.0, ans=0.125 2023-11-28 10:33:17,820 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 519700 2023-11-28 10:33:24,521 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3464640.0, ans=0.2 2023-11-28 10:33:28,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3464706.6666666665, ans=0.04949747468305833 2023-11-28 10:33:30,337 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.43 vs. limit=15.0 2023-11-28 10:33:39,598 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.40 vs. limit=15.0 2023-11-28 10:33:50,944 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 2700, loss[loss=0.04917, simple_loss=0.0567, pruned_loss=0.00846, audio_tagging_loss=0.01236, over 16402.00 frames. ], tot_loss[loss=0.066, simple_loss=0.09016, pruned_loss=0.01236, audio_tagging_loss=0.008562, over 3055549.20 frames. ], batch size: 64, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 10:33:54,284 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.512e+01 9.167e+01 9.683e+01 1.022e+02 1.162e+02, threshold=1.937e+02, percent-clipped=0.0 2023-11-28 10:33:56,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3464840.0, ans=0.07 2023-11-28 10:33:59,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3464840.0, ans=0.125 2023-11-28 10:34:12,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3464906.6666666665, ans=0.125 2023-11-28 10:34:16,171 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 519750 2023-11-28 10:34:19,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3464973.3333333335, ans=0.125 2023-11-28 10:34:40,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3465106.6666666665, ans=0.2 2023-11-28 10:34:48,179 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 2750, loss[loss=0.07869, simple_loss=0.1135, pruned_loss=0.01529, audio_tagging_loss=0.006652, over 16009.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.09053, pruned_loss=0.01263, audio_tagging_loss=0.008575, over 3061024.21 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 10:34:51,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3465173.3333333335, ans=0.125 2023-11-28 10:35:14,320 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 519800 2023-11-28 10:35:15,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3465306.6666666665, ans=0.125 2023-11-28 10:35:37,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3465440.0, ans=0.125 2023-11-28 10:35:42,923 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 10:35:47,385 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 2800, loss[loss=0.04945, simple_loss=0.06773, pruned_loss=0.007829, audio_tagging_loss=0.007757, over 14329.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.09025, pruned_loss=0.01247, audio_tagging_loss=0.008545, over 3055718.34 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:35:49,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3465506.6666666665, ans=0.125 2023-11-28 10:35:50,661 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.380e+01 8.532e+01 9.536e+01 1.008e+02 1.642e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-28 10:36:03,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3465573.3333333335, ans=0.125 2023-11-28 10:36:12,959 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 519850 2023-11-28 10:36:17,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3465640.0, ans=0.1 2023-11-28 10:36:38,165 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3465773.3333333335, ans=0.125 2023-11-28 10:36:39,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3465773.3333333335, ans=0.1 2023-11-28 10:36:43,045 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.93 vs. limit=15.0 2023-11-28 10:36:45,209 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 2850, loss[loss=0.07083, simple_loss=0.09496, pruned_loss=0.01445, audio_tagging_loss=0.008899, over 15085.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.0897, pruned_loss=0.01238, audio_tagging_loss=0.008538, over 3054046.07 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:36:56,913 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3465906.6666666665, ans=0.125 2023-11-28 10:37:11,193 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 519900 2023-11-28 10:37:27,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3466040.0, ans=0.125 2023-11-28 10:37:35,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3466106.6666666665, ans=0.125 2023-11-28 10:37:39,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3466106.6666666665, ans=0.125 2023-11-28 10:37:43,617 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 2900, loss[loss=0.06832, simple_loss=0.09787, pruned_loss=0.00967, audio_tagging_loss=0.009715, over 16608.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.09064, pruned_loss=0.01258, audio_tagging_loss=0.008508, over 3055459.44 frames. ], batch size: 59, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:37:43,874 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 10:37:46,883 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.553e+01 8.834e+01 9.612e+01 1.019e+02 1.318e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-28 10:38:01,787 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.07 vs. limit=15.0 2023-11-28 10:38:09,217 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 519950 2023-11-28 10:38:21,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3466373.3333333335, ans=0.125 2023-11-28 10:38:24,524 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.32 vs. limit=15.0 2023-11-28 10:38:29,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3466440.0, ans=0.125 2023-11-28 10:38:42,315 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 2950, loss[loss=0.06385, simple_loss=0.08508, pruned_loss=0.01003, audio_tagging_loss=0.01128, over 14520.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.09046, pruned_loss=0.01243, audio_tagging_loss=0.008565, over 3050291.37 frames. ], batch size: 54, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:38:44,144 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.14 vs. limit=22.5 2023-11-28 10:38:45,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3466506.6666666665, ans=0.0 2023-11-28 10:39:08,018 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 520000 2023-11-28 10:39:15,993 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3466640.0, ans=0.1 2023-11-28 10:39:20,264 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.66 vs. limit=12.0 2023-11-28 10:39:21,936 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.78 vs. limit=15.0 2023-11-28 10:39:25,885 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3466706.6666666665, ans=0.125 2023-11-28 10:39:29,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3466706.6666666665, ans=0.125 2023-11-28 10:39:42,328 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 3000, loss[loss=0.05439, simple_loss=0.06831, pruned_loss=0.01224, audio_tagging_loss=0.008003, over 14464.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.09148, pruned_loss=0.01263, audio_tagging_loss=0.008491, over 3049451.58 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:39:42,330 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-28 10:40:14,921 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([6.0399, 5.9100, 5.7098, 5.6556], device='cuda:1') 2023-11-28 10:40:18,158 INFO [train_asr.py:1267] (1/4) Epoch 44, validation: loss=0.05741, simple_loss=0.05054, pruned_loss=0.005252, audio_tagging_loss=0.02689, over 4681554.00 frames. 2023-11-28 10:40:18,159 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-28 10:40:21,404 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.793e+01 8.904e+01 9.559e+01 1.030e+02 1.233e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-28 10:40:42,574 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 520050 2023-11-28 10:40:59,423 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3467040.0, ans=0.125 2023-11-28 10:41:09,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3467106.6666666665, ans=0.125 2023-11-28 10:41:10,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3467106.6666666665, ans=0.2 2023-11-28 10:41:15,707 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 3050, loss[loss=0.05932, simple_loss=0.08627, pruned_loss=0.009275, audio_tagging_loss=0.006913, over 16008.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.09075, pruned_loss=0.01233, audio_tagging_loss=0.008555, over 3043995.20 frames. ], batch size: 60, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:41:18,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3467173.3333333335, ans=0.5 2023-11-28 10:41:24,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3467173.3333333335, ans=0.125 2023-11-28 10:41:37,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3467306.6666666665, ans=0.0 2023-11-28 10:41:39,291 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.02 vs. limit=12.0 2023-11-28 10:41:41,497 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 520100 2023-11-28 10:41:48,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3467306.6666666665, ans=0.125 2023-11-28 10:41:53,544 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 10:42:02,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3467440.0, ans=0.2 2023-11-28 10:42:03,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3467440.0, ans=0.07 2023-11-28 10:42:13,279 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 3100, loss[loss=0.0757, simple_loss=0.1038, pruned_loss=0.0152, audio_tagging_loss=0.008591, over 16528.00 frames. ], tot_loss[loss=0.06644, simple_loss=0.09102, pruned_loss=0.01234, audio_tagging_loss=0.008589, over 3052244.31 frames. ], batch size: 59, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:42:16,618 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.822e+01 8.845e+01 9.349e+01 1.011e+02 1.262e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-28 10:42:24,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3467573.3333333335, ans=0.2 2023-11-28 10:42:40,044 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 520150 2023-11-28 10:42:42,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3467640.0, ans=0.125 2023-11-28 10:42:43,433 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3467640.0, ans=0.015 2023-11-28 10:42:49,145 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3467706.6666666665, ans=0.125 2023-11-28 10:43:04,261 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3467773.3333333335, ans=0.125 2023-11-28 10:43:11,812 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 3150, loss[loss=0.06559, simple_loss=0.08337, pruned_loss=0.01502, audio_tagging_loss=0.008887, over 14043.00 frames. ], tot_loss[loss=0.06663, simple_loss=0.09112, pruned_loss=0.0124, audio_tagging_loss=0.008671, over 3050052.52 frames. ], batch size: 53, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:43:23,469 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3467906.6666666665, ans=0.2 2023-11-28 10:43:31,466 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.11 vs. limit=22.5 2023-11-28 10:43:31,526 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.20 vs. limit=22.5 2023-11-28 10:43:37,571 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 520200 2023-11-28 10:44:05,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3468106.6666666665, ans=0.125 2023-11-28 10:44:10,810 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 3200, loss[loss=0.04956, simple_loss=0.06287, pruned_loss=0.008177, audio_tagging_loss=0.009947, over 14199.00 frames. ], tot_loss[loss=0.06668, simple_loss=0.09103, pruned_loss=0.0124, audio_tagging_loss=0.008774, over 3050355.37 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 10:44:14,053 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.956e+01 8.853e+01 9.488e+01 1.043e+02 1.212e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-28 10:44:22,210 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.67 vs. limit=15.0 2023-11-28 10:44:24,385 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.96 vs. limit=15.0 2023-11-28 10:44:35,662 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 520250 2023-11-28 10:44:49,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3468373.3333333335, ans=0.09899494936611666 2023-11-28 10:45:07,192 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 3250, loss[loss=0.07596, simple_loss=0.09692, pruned_loss=0.01735, audio_tagging_loss=0.01015, over 15509.00 frames. ], tot_loss[loss=0.06684, simple_loss=0.0909, pruned_loss=0.01252, audio_tagging_loss=0.008868, over 3049006.40 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 10:45:09,529 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3468506.6666666665, ans=0.125 2023-11-28 10:45:28,754 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3468573.3333333335, ans=0.125 2023-11-28 10:45:32,846 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.17 vs. limit=15.0 2023-11-28 10:45:33,488 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 520300 2023-11-28 10:45:46,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3468706.6666666665, ans=0.0 2023-11-28 10:46:05,102 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 3300, loss[loss=0.08065, simple_loss=0.1243, pruned_loss=0.01248, audio_tagging_loss=0.006005, over 15457.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.09051, pruned_loss=0.01235, audio_tagging_loss=0.008983, over 3041176.80 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 10:46:05,326 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3468840.0, ans=0.0 2023-11-28 10:46:07,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3468840.0, ans=0.0 2023-11-28 10:46:08,846 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.715e+01 8.967e+01 9.560e+01 1.010e+02 1.793e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-28 10:46:25,358 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 10:46:30,801 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 520350 2023-11-28 10:46:42,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3469040.0, ans=0.0 2023-11-28 10:46:46,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3469040.0, ans=0.125 2023-11-28 10:46:56,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3469106.6666666665, ans=0.2 2023-11-28 10:47:03,680 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 3350, loss[loss=0.0671, simple_loss=0.09017, pruned_loss=0.01317, audio_tagging_loss=0.008846, over 15139.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.08967, pruned_loss=0.01235, audio_tagging_loss=0.008882, over 3041296.45 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 10:47:06,193 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3469173.3333333335, ans=0.2 2023-11-28 10:47:11,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3469173.3333333335, ans=0.2 2023-11-28 10:47:11,731 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3469173.3333333335, ans=0.125 2023-11-28 10:47:17,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3469240.0, ans=0.125 2023-11-28 10:47:17,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3469240.0, ans=0.2 2023-11-28 10:47:28,707 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 520400 2023-11-28 10:47:35,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3469306.6666666665, ans=0.0 2023-11-28 10:47:38,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3469373.3333333335, ans=0.0 2023-11-28 10:47:41,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3469373.3333333335, ans=0.125 2023-11-28 10:47:48,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3469373.3333333335, ans=0.0 2023-11-28 10:48:01,330 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 3400, loss[loss=0.08266, simple_loss=0.122, pruned_loss=0.01589, audio_tagging_loss=0.005795, over 15842.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.09014, pruned_loss=0.01245, audio_tagging_loss=0.008753, over 3040671.40 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:48:05,761 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.465e+01 8.926e+01 9.389e+01 1.002e+02 1.280e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-28 10:48:24,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3469640.0, ans=0.0 2023-11-28 10:48:27,285 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 520450 2023-11-28 10:48:59,556 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 3450, loss[loss=0.04904, simple_loss=0.06573, pruned_loss=0.009036, audio_tagging_loss=0.007138, over 14328.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.0895, pruned_loss=0.01228, audio_tagging_loss=0.008802, over 3038010.55 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:49:14,439 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.29 vs. limit=15.0 2023-11-28 10:49:25,420 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 520500 2023-11-28 10:49:39,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3470040.0, ans=0.2 2023-11-28 10:49:58,023 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 3500, loss[loss=0.06711, simple_loss=0.09449, pruned_loss=0.01126, audio_tagging_loss=0.008607, over 14846.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.09025, pruned_loss=0.01236, audio_tagging_loss=0.008697, over 3040481.38 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:50:02,334 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.565e+01 9.047e+01 9.689e+01 1.031e+02 1.305e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-28 10:50:09,494 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.59 vs. limit=15.0 2023-11-28 10:50:14,442 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3470240.0, ans=0.05 2023-11-28 10:50:23,683 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 520550 2023-11-28 10:50:30,298 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 10:50:37,109 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3470373.3333333335, ans=0.07 2023-11-28 10:50:56,673 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 3550, loss[loss=0.05927, simple_loss=0.07953, pruned_loss=0.01078, audio_tagging_loss=0.008726, over 16218.00 frames. ], tot_loss[loss=0.06589, simple_loss=0.08993, pruned_loss=0.01226, audio_tagging_loss=0.008654, over 3043559.70 frames. ], batch size: 63, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:51:00,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3470506.6666666665, ans=0.125 2023-11-28 10:51:22,630 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 520600 2023-11-28 10:51:32,683 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3470706.6666666665, ans=0.015 2023-11-28 10:51:36,882 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3470706.6666666665, ans=0.5 2023-11-28 10:51:47,271 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3470773.3333333335, ans=0.2 2023-11-28 10:51:55,153 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 3600, loss[loss=0.06856, simple_loss=0.08691, pruned_loss=0.01523, audio_tagging_loss=0.00987, over 15049.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.08999, pruned_loss=0.01232, audio_tagging_loss=0.008714, over 3042500.42 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:51:55,760 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.52 vs. limit=6.0 2023-11-28 10:52:00,706 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.557e+01 8.694e+01 9.447e+01 1.046e+02 1.297e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-28 10:52:05,744 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.61 vs. limit=15.0 2023-11-28 10:52:08,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3470906.6666666665, ans=0.125 2023-11-28 10:52:19,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3470973.3333333335, ans=0.0 2023-11-28 10:52:21,664 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 520650 2023-11-28 10:52:54,243 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 3650, loss[loss=0.07425, simple_loss=0.1037, pruned_loss=0.01301, audio_tagging_loss=0.009371, over 14656.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.0894, pruned_loss=0.01235, audio_tagging_loss=0.008705, over 3046151.22 frames. ], batch size: 54, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 10:52:56,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3471173.3333333335, ans=0.0 2023-11-28 10:53:06,924 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3471240.0, ans=0.2 2023-11-28 10:53:19,815 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 520700 2023-11-28 10:53:34,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3471373.3333333335, ans=0.0 2023-11-28 10:53:41,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3471440.0, ans=0.2 2023-11-28 10:53:41,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3471440.0, ans=0.125 2023-11-28 10:53:43,011 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.63 vs. limit=22.5 2023-11-28 10:53:43,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3471440.0, ans=0.125 2023-11-28 10:53:52,251 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 3700, loss[loss=0.05933, simple_loss=0.07909, pruned_loss=0.01118, audio_tagging_loss=0.008598, over 15641.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.08971, pruned_loss=0.01231, audio_tagging_loss=0.008688, over 3044014.08 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 10:53:59,768 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.224e+01 8.858e+01 9.302e+01 9.977e+01 1.303e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-28 10:54:16,542 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.37 vs. limit=10.0 2023-11-28 10:54:19,254 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 520750 2023-11-28 10:54:40,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3471773.3333333335, ans=0.1 2023-11-28 10:54:51,717 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 3750, loss[loss=0.06077, simple_loss=0.08822, pruned_loss=0.009794, audio_tagging_loss=0.006871, over 15181.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.09089, pruned_loss=0.01226, audio_tagging_loss=0.008631, over 3043185.23 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 10:55:01,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3471840.0, ans=0.125 2023-11-28 10:55:01,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3471840.0, ans=0.1 2023-11-28 10:55:08,753 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3471906.6666666665, ans=0.125 2023-11-28 10:55:17,508 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 520800 2023-11-28 10:55:35,485 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 10:55:51,424 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 3800, loss[loss=0.05836, simple_loss=0.07822, pruned_loss=0.009456, audio_tagging_loss=0.009799, over 15212.00 frames. ], tot_loss[loss=0.06671, simple_loss=0.09116, pruned_loss=0.01239, audio_tagging_loss=0.008742, over 3045810.65 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 10:55:57,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3472173.3333333335, ans=0.2 2023-11-28 10:55:58,019 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.470e+01 9.010e+01 9.587e+01 1.023e+02 1.351e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-28 10:55:58,326 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3472173.3333333335, ans=0.1 2023-11-28 10:56:16,969 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 520850 2023-11-28 10:56:26,693 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=3472373.3333333335, ans=0.05 2023-11-28 10:56:31,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3472373.3333333335, ans=0.1 2023-11-28 10:56:49,701 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 3850, loss[loss=0.06783, simple_loss=0.09767, pruned_loss=0.01175, audio_tagging_loss=0.007243, over 14596.00 frames. ], tot_loss[loss=0.06657, simple_loss=0.0912, pruned_loss=0.01231, audio_tagging_loss=0.008658, over 3044140.04 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 10:56:52,282 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.53 vs. limit=15.0 2023-11-28 10:56:54,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3472506.6666666665, ans=0.125 2023-11-28 10:57:12,437 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3472640.0, ans=0.125 2023-11-28 10:57:15,574 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 520900 2023-11-28 10:57:48,652 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 3900, loss[loss=0.0662, simple_loss=0.08715, pruned_loss=0.01303, audio_tagging_loss=0.009604, over 16517.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.09083, pruned_loss=0.01235, audio_tagging_loss=0.008792, over 3045536.55 frames. ], batch size: 61, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 10:57:56,092 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.009e+01 8.789e+01 9.361e+01 1.021e+02 3.606e+02, threshold=1.872e+02, percent-clipped=1.0 2023-11-28 10:57:57,834 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.43 vs. limit=10.0 2023-11-28 10:58:00,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3472906.6666666665, ans=0.1 2023-11-28 10:58:03,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3472906.6666666665, ans=0.125 2023-11-28 10:58:08,529 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.25 vs. limit=15.0 2023-11-28 10:58:13,610 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3472973.3333333335, ans=0.125 2023-11-28 10:58:14,620 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 520950 2023-11-28 10:58:24,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3473040.0, ans=0.2 2023-11-28 10:58:40,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3473106.6666666665, ans=0.125 2023-11-28 10:58:40,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3473106.6666666665, ans=0.1 2023-11-28 10:58:45,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3473106.6666666665, ans=0.1 2023-11-28 10:58:48,211 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 3950, loss[loss=0.05796, simple_loss=0.08197, pruned_loss=0.008468, audio_tagging_loss=0.008502, over 15484.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.09027, pruned_loss=0.01223, audio_tagging_loss=0.008871, over 3045013.65 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 10:58:55,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3473173.3333333335, ans=0.125 2023-11-28 10:59:00,126 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.89 vs. limit=15.0 2023-11-28 10:59:00,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3473240.0, ans=0.125 2023-11-28 10:59:03,211 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.12 vs. limit=15.0 2023-11-28 10:59:12,879 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 521000 2023-11-28 10:59:24,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3473373.3333333335, ans=0.025 2023-11-28 10:59:33,338 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.98 vs. limit=22.5 2023-11-28 10:59:46,258 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 4000, loss[loss=0.08711, simple_loss=0.1215, pruned_loss=0.01855, audio_tagging_loss=0.007814, over 17059.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.09037, pruned_loss=0.01223, audio_tagging_loss=0.008896, over 3048481.54 frames. ], batch size: 61, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:59:49,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3473506.6666666665, ans=0.1 2023-11-28 10:59:52,932 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.537e+01 8.959e+01 9.483e+01 1.017e+02 1.499e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-28 10:59:59,272 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.88 vs. limit=15.0 2023-11-28 11:00:01,255 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.34 vs. limit=15.0 2023-11-28 11:00:12,075 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 521050 2023-11-28 11:00:12,619 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.02 vs. limit=15.0 2023-11-28 11:00:37,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3473773.3333333335, ans=0.125 2023-11-28 11:00:40,233 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.73 vs. limit=22.5 2023-11-28 11:00:44,040 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 4050, loss[loss=0.05618, simple_loss=0.06921, pruned_loss=0.01125, audio_tagging_loss=0.01033, over 14647.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.09007, pruned_loss=0.01218, audio_tagging_loss=0.008897, over 3052121.29 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 11:00:50,386 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 11:01:10,338 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 521100 2023-11-28 11:01:12,726 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3473973.3333333335, ans=0.1 2023-11-28 11:01:17,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3473973.3333333335, ans=0.0 2023-11-28 11:01:18,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3474040.0, ans=0.1 2023-11-28 11:01:30,665 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3474106.6666666665, ans=0.95 2023-11-28 11:01:32,349 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 11:01:42,741 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 4100, loss[loss=0.06956, simple_loss=0.09605, pruned_loss=0.01457, audio_tagging_loss=0.006967, over 15822.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.09053, pruned_loss=0.01229, audio_tagging_loss=0.008829, over 3049927.75 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 11:01:51,403 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.742e+01 8.779e+01 9.580e+01 1.037e+02 1.315e+02, threshold=1.916e+02, percent-clipped=0.0 2023-11-28 11:02:08,152 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 521150 2023-11-28 11:02:10,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3474306.6666666665, ans=0.125 2023-11-28 11:02:27,576 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-28 11:02:41,147 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.72 vs. limit=15.0 2023-11-28 11:02:41,702 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 4150, loss[loss=0.0528, simple_loss=0.07888, pruned_loss=0.007526, audio_tagging_loss=0.005837, over 15961.00 frames. ], tot_loss[loss=0.06674, simple_loss=0.09113, pruned_loss=0.01244, audio_tagging_loss=0.00873, over 3046041.60 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 11:02:48,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3474506.6666666665, ans=0.1 2023-11-28 11:02:50,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3474506.6666666665, ans=0.0 2023-11-28 11:02:54,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3474573.3333333335, ans=0.2 2023-11-28 11:03:06,653 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.37 vs. limit=6.0 2023-11-28 11:03:07,966 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 521200 2023-11-28 11:03:26,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3474706.6666666665, ans=0.125 2023-11-28 11:03:28,264 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 11:03:40,523 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 4200, loss[loss=0.05947, simple_loss=0.08932, pruned_loss=0.007229, audio_tagging_loss=0.007576, over 14938.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.09043, pruned_loss=0.01224, audio_tagging_loss=0.00869, over 3053056.14 frames. ], batch size: 53, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 11:03:40,754 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3474840.0, ans=0.125 2023-11-28 11:03:43,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3474840.0, ans=0.125 2023-11-28 11:03:49,030 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.420e+01 8.843e+01 9.445e+01 1.017e+02 1.271e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-28 11:04:03,271 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=3474906.6666666665, ans=0.05 2023-11-28 11:04:04,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3474973.3333333335, ans=0.125 2023-11-28 11:04:07,606 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 521250 2023-11-28 11:04:18,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3475040.0, ans=0.2 2023-11-28 11:04:35,834 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=7.73 vs. limit=12.0 2023-11-28 11:04:39,682 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 4250, loss[loss=0.06745, simple_loss=0.09606, pruned_loss=0.0141, audio_tagging_loss=0.005321, over 15340.00 frames. ], tot_loss[loss=0.06674, simple_loss=0.09137, pruned_loss=0.01252, audio_tagging_loss=0.008535, over 3047430.41 frames. ], batch size: 59, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 11:04:45,339 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 11:05:01,538 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3475240.0, ans=0.2 2023-11-28 11:05:05,809 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 521300 2023-11-28 11:05:22,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3475373.3333333335, ans=0.2 2023-11-28 11:05:39,415 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 4300, loss[loss=0.07677, simple_loss=0.1077, pruned_loss=0.01421, audio_tagging_loss=0.008716, over 14839.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.09107, pruned_loss=0.0125, audio_tagging_loss=0.008488, over 3044999.99 frames. ], batch size: 54, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 11:05:47,120 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.428e+01 8.879e+01 9.468e+01 1.032e+02 1.370e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-28 11:05:48,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3475506.6666666665, ans=0.125 2023-11-28 11:06:04,312 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 521350 2023-11-28 11:06:26,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3475773.3333333335, ans=0.125 2023-11-28 11:06:36,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3475840.0, ans=0.0 2023-11-28 11:06:37,550 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 4350, loss[loss=0.07589, simple_loss=0.1038, pruned_loss=0.01528, audio_tagging_loss=0.008713, over 15474.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.09104, pruned_loss=0.01253, audio_tagging_loss=0.008512, over 3051014.33 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 11:06:46,706 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.87 vs. limit=15.0 2023-11-28 11:06:53,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3475906.6666666665, ans=0.2 2023-11-28 11:07:04,082 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 521400 2023-11-28 11:07:08,337 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.41 vs. limit=6.0 2023-11-28 11:07:27,623 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.21 vs. limit=15.0 2023-11-28 11:07:32,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3476106.6666666665, ans=0.05 2023-11-28 11:07:32,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3476106.6666666665, ans=0.1 2023-11-28 11:07:36,005 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.73 vs. limit=10.0 2023-11-28 11:07:36,289 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 4400, loss[loss=0.07386, simple_loss=0.09403, pruned_loss=0.01672, audio_tagging_loss=0.01013, over 14741.00 frames. ], tot_loss[loss=0.06681, simple_loss=0.09152, pruned_loss=0.01257, audio_tagging_loss=0.008482, over 3049607.95 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:07:42,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3476173.3333333335, ans=0.125 2023-11-28 11:07:42,548 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.43 vs. limit=6.0 2023-11-28 11:07:44,600 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.862e+01 9.068e+01 9.728e+01 1.034e+02 1.377e+02, threshold=1.946e+02, percent-clipped=0.0 2023-11-28 11:07:59,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3476306.6666666665, ans=0.125 2023-11-28 11:08:02,164 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 521450 2023-11-28 11:08:15,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3476373.3333333335, ans=0.125 2023-11-28 11:08:29,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3476440.0, ans=0.1 2023-11-28 11:08:35,685 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 4450, loss[loss=0.06019, simple_loss=0.08417, pruned_loss=0.00726, audio_tagging_loss=0.01084, over 15524.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.09075, pruned_loss=0.01237, audio_tagging_loss=0.008416, over 3045449.30 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:08:37,052 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3476506.6666666665, ans=0.1 2023-11-28 11:08:42,670 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3476506.6666666665, ans=0.2 2023-11-28 11:08:56,782 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3476640.0, ans=0.125 2023-11-28 11:09:00,808 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 521500 2023-11-28 11:09:03,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3476640.0, ans=0.0 2023-11-28 11:09:09,210 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.56 vs. limit=22.5 2023-11-28 11:09:16,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3476706.6666666665, ans=0.125 2023-11-28 11:09:18,354 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3476706.6666666665, ans=0.2 2023-11-28 11:09:32,881 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.37 vs. limit=15.0 2023-11-28 11:09:33,522 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 4500, loss[loss=0.06028, simple_loss=0.07271, pruned_loss=0.01124, audio_tagging_loss=0.01268, over 16005.00 frames. ], tot_loss[loss=0.06624, simple_loss=0.09073, pruned_loss=0.01246, audio_tagging_loss=0.008416, over 3048022.70 frames. ], batch size: 60, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:09:41,301 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.590e+01 8.818e+01 9.367e+01 9.979e+01 1.467e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-28 11:09:47,781 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3476906.6666666665, ans=0.125 2023-11-28 11:09:48,275 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.01 vs. limit=15.0 2023-11-28 11:09:51,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3476906.6666666665, ans=0.04949747468305833 2023-11-28 11:09:59,999 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 521550 2023-11-28 11:10:00,575 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.40 vs. limit=22.5 2023-11-28 11:10:13,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3477040.0, ans=0.1 2023-11-28 11:10:27,260 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3477106.6666666665, ans=0.125 2023-11-28 11:10:32,144 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 4550, loss[loss=0.07167, simple_loss=0.1075, pruned_loss=0.0127, audio_tagging_loss=0.005225, over 15928.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.09116, pruned_loss=0.01247, audio_tagging_loss=0.008421, over 3051332.63 frames. ], batch size: 59, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:10:33,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3477173.3333333335, ans=0.125 2023-11-28 11:10:58,553 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 521600 2023-11-28 11:11:12,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3477373.3333333335, ans=0.125 2023-11-28 11:11:21,736 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 11:11:31,620 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 4600, loss[loss=0.07019, simple_loss=0.09073, pruned_loss=0.01539, audio_tagging_loss=0.009434, over 14507.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.09075, pruned_loss=0.01248, audio_tagging_loss=0.008513, over 3047941.55 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:11:36,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3477506.6666666665, ans=0.0 2023-11-28 11:11:38,007 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3477506.6666666665, ans=0.1 2023-11-28 11:11:39,939 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.011e+01 8.873e+01 9.292e+01 1.017e+02 1.163e+02, threshold=1.858e+02, percent-clipped=0.0 2023-11-28 11:11:41,575 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.48 vs. limit=10.0 2023-11-28 11:11:46,251 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.50 vs. limit=10.0 2023-11-28 11:11:50,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3477573.3333333335, ans=0.125 2023-11-28 11:11:56,659 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 521650 2023-11-28 11:11:59,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3477640.0, ans=0.04949747468305833 2023-11-28 11:12:27,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3477773.3333333335, ans=0.125 2023-11-28 11:12:30,117 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 4650, loss[loss=0.07946, simple_loss=0.1088, pruned_loss=0.01628, audio_tagging_loss=0.008785, over 15566.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.0908, pruned_loss=0.01253, audio_tagging_loss=0.008581, over 3047016.65 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:12:55,423 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 521700 2023-11-28 11:13:04,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3478040.0, ans=0.0 2023-11-28 11:13:04,758 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3478040.0, ans=0.0 2023-11-28 11:13:07,945 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3478040.0, ans=0.1 2023-11-28 11:13:14,656 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.90 vs. limit=8.0 2023-11-28 11:13:15,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3478040.0, ans=0.0 2023-11-28 11:13:28,662 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 4700, loss[loss=0.06651, simple_loss=0.09194, pruned_loss=0.01306, audio_tagging_loss=0.007477, over 16985.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.09033, pruned_loss=0.01257, audio_tagging_loss=0.008736, over 3053107.42 frames. ], batch size: 65, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:13:36,452 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.272e+01 8.974e+01 9.921e+01 1.076e+02 1.441e+02, threshold=1.984e+02, percent-clipped=0.0 2023-11-28 11:13:43,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3478240.0, ans=0.125 2023-11-28 11:13:48,100 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 11:13:55,078 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 521750 2023-11-28 11:14:00,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3478306.6666666665, ans=0.125 2023-11-28 11:14:12,449 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.70 vs. limit=22.5 2023-11-28 11:14:19,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3478440.0, ans=0.125 2023-11-28 11:14:21,536 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3478440.0, ans=0.1 2023-11-28 11:14:27,483 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 4750, loss[loss=0.05937, simple_loss=0.07578, pruned_loss=0.01123, audio_tagging_loss=0.01025, over 16628.00 frames. ], tot_loss[loss=0.06662, simple_loss=0.09036, pruned_loss=0.01263, audio_tagging_loss=0.00881, over 3051548.44 frames. ], batch size: 65, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 11:14:37,837 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.76 vs. limit=10.0 2023-11-28 11:14:50,505 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3478640.0, ans=0.0 2023-11-28 11:14:52,663 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 521800 2023-11-28 11:14:55,306 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 11:14:56,536 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3478640.0, ans=0.125 2023-11-28 11:14:59,610 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.59 vs. limit=15.0 2023-11-28 11:15:01,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3478706.6666666665, ans=0.125 2023-11-28 11:15:03,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3478706.6666666665, ans=0.0 2023-11-28 11:15:06,291 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.82 vs. limit=15.0 2023-11-28 11:15:12,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3478706.6666666665, ans=0.125 2023-11-28 11:15:22,051 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3478773.3333333335, ans=0.0 2023-11-28 11:15:24,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3478840.0, ans=0.125 2023-11-28 11:15:25,682 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 4800, loss[loss=0.06592, simple_loss=0.08917, pruned_loss=0.01047, audio_tagging_loss=0.01087, over 14744.00 frames. ], tot_loss[loss=0.06691, simple_loss=0.09101, pruned_loss=0.01261, audio_tagging_loss=0.008794, over 3045228.07 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:15:26,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3478840.0, ans=0.1 2023-11-28 11:15:34,584 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.537e+01 8.828e+01 9.577e+01 1.068e+02 1.342e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-28 11:15:49,185 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 11:15:51,093 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 521850 2023-11-28 11:15:53,561 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3478973.3333333335, ans=0.0 2023-11-28 11:15:55,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3478973.3333333335, ans=0.09899494936611666 2023-11-28 11:16:05,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3479040.0, ans=0.0 2023-11-28 11:16:07,981 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3479040.0, ans=0.125 2023-11-28 11:16:10,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3479040.0, ans=0.0 2023-11-28 11:16:23,921 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 4850, loss[loss=0.06349, simple_loss=0.09237, pruned_loss=0.01041, audio_tagging_loss=0.006896, over 14860.00 frames. ], tot_loss[loss=0.06657, simple_loss=0.09036, pruned_loss=0.01253, audio_tagging_loss=0.008859, over 3039667.83 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:16:26,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3479173.3333333335, ans=0.125 2023-11-28 11:16:28,590 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.59 vs. limit=15.0 2023-11-28 11:16:49,894 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 521900 2023-11-28 11:17:18,691 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.40 vs. limit=22.5 2023-11-28 11:17:22,684 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 4900, loss[loss=0.09217, simple_loss=0.1323, pruned_loss=0.0206, audio_tagging_loss=0.005433, over 16536.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.09048, pruned_loss=0.01244, audio_tagging_loss=0.008882, over 3045077.89 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:17:30,027 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.19 vs. limit=15.0 2023-11-28 11:17:32,628 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.438e+01 8.706e+01 9.491e+01 1.021e+02 1.931e+02, threshold=1.898e+02, percent-clipped=1.0 2023-11-28 11:17:40,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3479573.3333333335, ans=0.125 2023-11-28 11:17:48,986 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 521950 2023-11-28 11:17:50,226 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3479640.0, ans=0.125 2023-11-28 11:17:51,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3479640.0, ans=0.125 2023-11-28 11:18:15,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3479773.3333333335, ans=0.09899494936611666 2023-11-28 11:18:21,532 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 4950, loss[loss=0.07192, simple_loss=0.08956, pruned_loss=0.01552, audio_tagging_loss=0.01162, over 15341.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.09014, pruned_loss=0.01229, audio_tagging_loss=0.008798, over 3038629.14 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:18:47,268 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 522000 2023-11-28 11:18:50,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3479973.3333333335, ans=0.1 2023-11-28 11:18:50,339 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.68 vs. limit=6.0 2023-11-28 11:19:03,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3480040.0, ans=0.125 2023-11-28 11:19:20,162 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 5000, loss[loss=0.06911, simple_loss=0.09324, pruned_loss=0.01392, audio_tagging_loss=0.008569, over 15398.00 frames. ], tot_loss[loss=0.06596, simple_loss=0.09, pruned_loss=0.01228, audio_tagging_loss=0.008684, over 3036892.02 frames. ], batch size: 59, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:19:29,633 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.223e+01 8.777e+01 9.263e+01 9.841e+01 1.147e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-28 11:19:33,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3480240.0, ans=0.025 2023-11-28 11:19:46,481 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 522050 2023-11-28 11:19:51,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3480306.6666666665, ans=0.0 2023-11-28 11:20:01,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3480373.3333333335, ans=0.125 2023-11-28 11:20:06,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3480440.0, ans=0.2 2023-11-28 11:20:18,904 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 5050, loss[loss=0.05338, simple_loss=0.07703, pruned_loss=0.006795, audio_tagging_loss=0.008067, over 15309.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.09004, pruned_loss=0.01228, audio_tagging_loss=0.008635, over 3040433.17 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:20:27,994 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3480506.6666666665, ans=0.2 2023-11-28 11:20:36,906 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3480573.3333333335, ans=0.125 2023-11-28 11:20:38,124 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3480573.3333333335, ans=0.125 2023-11-28 11:20:41,900 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.56 vs. limit=15.0 2023-11-28 11:20:44,570 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 522100 2023-11-28 11:21:17,564 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 5100, loss[loss=0.07164, simple_loss=0.09233, pruned_loss=0.01476, audio_tagging_loss=0.01071, over 15142.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.08957, pruned_loss=0.0121, audio_tagging_loss=0.008682, over 3049312.95 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:21:19,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3480840.0, ans=0.0 2023-11-28 11:21:26,392 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.577e+01 8.858e+01 9.488e+01 1.012e+02 1.214e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-28 11:21:26,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3480840.0, ans=0.125 2023-11-28 11:21:43,439 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 522150 2023-11-28 11:21:59,237 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 11:22:01,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3481040.0, ans=10.0 2023-11-28 11:22:10,582 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.70 vs. limit=15.0 2023-11-28 11:22:11,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3481106.6666666665, ans=0.125 2023-11-28 11:22:15,699 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 5150, loss[loss=0.06233, simple_loss=0.07999, pruned_loss=0.01273, audio_tagging_loss=0.009603, over 15249.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08924, pruned_loss=0.01218, audio_tagging_loss=0.008652, over 3047416.88 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:22:42,087 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 522200 2023-11-28 11:22:56,875 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 11:23:01,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3481440.0, ans=0.0 2023-11-28 11:23:14,781 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 5200, loss[loss=0.06933, simple_loss=0.0958, pruned_loss=0.01338, audio_tagging_loss=0.008051, over 15207.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.09016, pruned_loss=0.01247, audio_tagging_loss=0.008598, over 3049983.78 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 11:23:18,672 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.96 vs. limit=15.0 2023-11-28 11:23:24,312 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.856e+01 8.751e+01 9.601e+01 1.026e+02 1.242e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-28 11:23:40,011 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 522250 2023-11-28 11:23:49,668 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.75 vs. limit=22.5 2023-11-28 11:23:58,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3481706.6666666665, ans=0.1 2023-11-28 11:23:58,505 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3481706.6666666665, ans=0.125 2023-11-28 11:24:02,917 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3481773.3333333335, ans=0.125 2023-11-28 11:24:12,209 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 5250, loss[loss=0.06343, simple_loss=0.09127, pruned_loss=0.0102, audio_tagging_loss=0.007597, over 15646.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.09102, pruned_loss=0.01256, audio_tagging_loss=0.008523, over 3058536.00 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:24:35,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3481973.3333333335, ans=0.1 2023-11-28 11:24:36,888 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.91 vs. limit=22.5 2023-11-28 11:24:37,408 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 522300 2023-11-28 11:24:42,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3481973.3333333335, ans=0.0 2023-11-28 11:24:56,564 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3482040.0, ans=0.125 2023-11-28 11:24:57,688 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3482106.6666666665, ans=0.125 2023-11-28 11:24:59,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3482106.6666666665, ans=0.125 2023-11-28 11:25:09,473 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 5300, loss[loss=0.05748, simple_loss=0.07601, pruned_loss=0.009697, audio_tagging_loss=0.009779, over 15645.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.09061, pruned_loss=0.01241, audio_tagging_loss=0.008518, over 3057713.01 frames. ], batch size: 61, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:25:19,333 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.372e+01 8.992e+01 9.491e+01 1.033e+02 1.599e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-28 11:25:26,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3482240.0, ans=0.0 2023-11-28 11:25:35,855 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 522350 2023-11-28 11:25:42,924 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.64 vs. limit=22.5 2023-11-28 11:26:06,875 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3482506.6666666665, ans=0.125 2023-11-28 11:26:07,643 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 5350, loss[loss=0.06237, simple_loss=0.08489, pruned_loss=0.01208, audio_tagging_loss=0.007848, over 14674.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.09126, pruned_loss=0.0125, audio_tagging_loss=0.008452, over 3050990.44 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:26:07,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3482506.6666666665, ans=0.125 2023-11-28 11:26:11,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3482506.6666666665, ans=0.0 2023-11-28 11:26:18,238 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.84 vs. limit=15.0 2023-11-28 11:26:19,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3482573.3333333335, ans=0.05 2023-11-28 11:26:33,724 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 522400 2023-11-28 11:26:36,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3482640.0, ans=0.125 2023-11-28 11:27:01,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3482773.3333333335, ans=0.125 2023-11-28 11:27:07,437 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 5400, loss[loss=0.07113, simple_loss=0.1046, pruned_loss=0.0105, audio_tagging_loss=0.008311, over 16965.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.09099, pruned_loss=0.01235, audio_tagging_loss=0.008529, over 3047392.81 frames. ], batch size: 61, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:27:17,353 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.334e+01 8.830e+01 9.403e+01 1.046e+02 1.380e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-28 11:27:30,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3482973.3333333335, ans=0.125 2023-11-28 11:27:31,940 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 522450 2023-11-28 11:27:34,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3482973.3333333335, ans=0.125 2023-11-28 11:27:39,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3482973.3333333335, ans=0.2 2023-11-28 11:27:42,654 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3483040.0, ans=0.0 2023-11-28 11:28:05,747 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 5450, loss[loss=0.05377, simple_loss=0.06804, pruned_loss=0.008383, audio_tagging_loss=0.01137, over 14781.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.08988, pruned_loss=0.0122, audio_tagging_loss=0.008591, over 3046509.25 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:28:08,446 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.26 vs. limit=15.0 2023-11-28 11:28:32,348 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 522500 2023-11-28 11:28:43,410 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.41 vs. limit=15.0 2023-11-28 11:28:51,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3483373.3333333335, ans=0.0 2023-11-28 11:28:52,639 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.58 vs. limit=12.0 2023-11-28 11:28:54,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3483440.0, ans=0.125 2023-11-28 11:28:58,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3483440.0, ans=0.0 2023-11-28 11:29:04,439 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 5500, loss[loss=0.05707, simple_loss=0.07727, pruned_loss=0.009625, audio_tagging_loss=0.008813, over 15341.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.09029, pruned_loss=0.01227, audio_tagging_loss=0.00857, over 3051815.13 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:29:04,765 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3483506.6666666665, ans=0.125 2023-11-28 11:29:15,292 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.510e+01 8.610e+01 9.341e+01 1.002e+02 1.177e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-28 11:29:15,997 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.99 vs. limit=22.5 2023-11-28 11:29:21,868 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=3483573.3333333335, ans=10.0 2023-11-28 11:29:23,366 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.41 vs. limit=22.5 2023-11-28 11:29:30,906 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 522550 2023-11-28 11:29:42,033 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.93 vs. limit=15.0 2023-11-28 11:30:03,423 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3483840.0, ans=0.0 2023-11-28 11:30:04,949 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 5550, loss[loss=0.0468, simple_loss=0.05794, pruned_loss=0.006617, audio_tagging_loss=0.01121, over 14299.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.09093, pruned_loss=0.01231, audio_tagging_loss=0.00861, over 3048415.04 frames. ], batch size: 54, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:30:05,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten.whitening_limit, batch_count=3483840.0, ans=15.0 2023-11-28 11:30:09,158 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.13 vs. limit=15.0 2023-11-28 11:30:19,919 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3483906.6666666665, ans=0.125 2023-11-28 11:30:20,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3483906.6666666665, ans=0.0 2023-11-28 11:30:23,196 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3483906.6666666665, ans=0.125 2023-11-28 11:30:29,967 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 522600 2023-11-28 11:30:44,784 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3484040.0, ans=0.2 2023-11-28 11:31:04,157 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 5600, loss[loss=0.07907, simple_loss=0.1052, pruned_loss=0.01919, audio_tagging_loss=0.007302, over 15435.00 frames. ], tot_loss[loss=0.06687, simple_loss=0.09139, pruned_loss=0.01247, audio_tagging_loss=0.0087, over 3048271.94 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 11:31:06,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff3.min_abs, batch_count=3484173.3333333335, ans=0.2 2023-11-28 11:31:14,178 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.656e+01 9.030e+01 9.835e+01 1.064e+02 3.078e+02, threshold=1.967e+02, percent-clipped=1.0 2023-11-28 11:31:19,257 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.05 vs. limit=15.0 2023-11-28 11:31:20,777 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3484240.0, ans=0.09899494936611666 2023-11-28 11:31:26,725 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.34 vs. limit=15.0 2023-11-28 11:31:29,464 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 522650 2023-11-28 11:31:51,256 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 11:31:53,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3484440.0, ans=0.0 2023-11-28 11:32:02,662 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 5650, loss[loss=0.05961, simple_loss=0.08341, pruned_loss=0.009191, audio_tagging_loss=0.00871, over 15295.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.09101, pruned_loss=0.01239, audio_tagging_loss=0.008757, over 3052254.18 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 11:32:30,079 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 522700 2023-11-28 11:33:01,695 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.64 vs. limit=10.0 2023-11-28 11:33:02,816 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 5700, loss[loss=0.06961, simple_loss=0.08883, pruned_loss=0.01584, audio_tagging_loss=0.009359, over 16072.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.09009, pruned_loss=0.01222, audio_tagging_loss=0.008789, over 3053658.55 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:33:04,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3484840.0, ans=0.0 2023-11-28 11:33:14,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3484906.6666666665, ans=0.025 2023-11-28 11:33:15,231 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.269e+01 8.782e+01 9.296e+01 1.023e+02 1.172e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-28 11:33:28,852 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 522750 2023-11-28 11:33:36,667 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.40 vs. limit=15.0 2023-11-28 11:33:40,173 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.19 vs. limit=15.0 2023-11-28 11:33:49,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3485106.6666666665, ans=0.125 2023-11-28 11:34:01,672 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3485173.3333333335, ans=0.125 2023-11-28 11:34:02,534 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 5750, loss[loss=0.04899, simple_loss=0.07447, pruned_loss=0.004452, audio_tagging_loss=0.007299, over 15956.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.0906, pruned_loss=0.01222, audio_tagging_loss=0.008672, over 3061410.71 frames. ], batch size: 60, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:34:03,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3485173.3333333335, ans=0.2 2023-11-28 11:34:25,072 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.19 vs. limit=15.0 2023-11-28 11:34:28,187 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 522800 2023-11-28 11:35:01,855 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 5800, loss[loss=0.05304, simple_loss=0.07006, pruned_loss=0.01047, audio_tagging_loss=0.007547, over 13502.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.09077, pruned_loss=0.01243, audio_tagging_loss=0.008522, over 3060518.62 frames. ], batch size: 53, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:35:10,059 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.31 vs. limit=15.0 2023-11-28 11:35:13,801 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.175e+01 8.794e+01 9.521e+01 1.033e+02 1.295e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-28 11:35:25,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3485640.0, ans=0.2 2023-11-28 11:35:25,552 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.01 vs. limit=15.0 2023-11-28 11:35:28,291 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 522850 2023-11-28 11:35:49,309 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3485773.3333333335, ans=0.125 2023-11-28 11:35:54,376 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3485773.3333333335, ans=0.0 2023-11-28 11:35:58,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3485773.3333333335, ans=0.125 2023-11-28 11:36:00,888 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 5850, loss[loss=0.06486, simple_loss=0.08828, pruned_loss=0.01227, audio_tagging_loss=0.008448, over 16514.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.09, pruned_loss=0.01231, audio_tagging_loss=0.008501, over 3065676.54 frames. ], batch size: 63, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 11:36:19,479 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.89 vs. limit=15.0 2023-11-28 11:36:22,523 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3485906.6666666665, ans=0.125 2023-11-28 11:36:26,676 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 522900 2023-11-28 11:36:51,859 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.98 vs. limit=15.0 2023-11-28 11:36:59,230 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 5900, loss[loss=0.07087, simple_loss=0.09288, pruned_loss=0.01446, audio_tagging_loss=0.009972, over 14525.00 frames. ], tot_loss[loss=0.06595, simple_loss=0.09004, pruned_loss=0.01241, audio_tagging_loss=0.008524, over 3069086.01 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 11:37:07,233 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.89 vs. limit=15.0 2023-11-28 11:37:11,247 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.465e+01 8.935e+01 9.645e+01 1.023e+02 1.416e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-28 11:37:25,666 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 522950 2023-11-28 11:37:32,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3486306.6666666665, ans=0.1 2023-11-28 11:37:34,286 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3486373.3333333335, ans=0.1 2023-11-28 11:37:34,371 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3486373.3333333335, ans=0.125 2023-11-28 11:37:58,753 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 5950, loss[loss=0.06934, simple_loss=0.08935, pruned_loss=0.01803, audio_tagging_loss=0.006635, over 16253.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.0895, pruned_loss=0.01227, audio_tagging_loss=0.008604, over 3072434.14 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 11:38:20,071 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.80 vs. limit=6.0 2023-11-28 11:38:22,199 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.09 vs. limit=22.5 2023-11-28 11:38:24,986 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 523000 2023-11-28 11:38:32,186 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3486640.0, ans=0.025 2023-11-28 11:38:34,809 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.37 vs. limit=15.0 2023-11-28 11:38:38,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3486706.6666666665, ans=0.125 2023-11-28 11:38:42,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3486706.6666666665, ans=0.2 2023-11-28 11:38:47,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3486773.3333333335, ans=0.125 2023-11-28 11:38:51,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3486773.3333333335, ans=0.125 2023-11-28 11:38:57,271 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.67 vs. limit=15.0 2023-11-28 11:38:57,779 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 6000, loss[loss=0.04899, simple_loss=0.06582, pruned_loss=0.007197, audio_tagging_loss=0.008885, over 14657.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08956, pruned_loss=0.01225, audio_tagging_loss=0.008561, over 3073416.86 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 11:38:57,780 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-28 11:39:33,653 INFO [train_asr.py:1267] (1/4) Epoch 44, validation: loss=0.05792, simple_loss=0.0506, pruned_loss=0.005293, audio_tagging_loss=0.02732, over 4681554.00 frames. 2023-11-28 11:39:33,654 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-28 11:39:45,290 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.423e+01 8.849e+01 9.422e+01 1.008e+02 1.234e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-28 11:39:59,578 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 523050 2023-11-28 11:40:13,618 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3487040.0, ans=0.0 2023-11-28 11:40:13,688 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3487040.0, ans=0.125 2023-11-28 11:40:13,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3487040.0, ans=0.2 2023-11-28 11:40:16,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3487040.0, ans=0.125 2023-11-28 11:40:20,222 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 11:40:31,911 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 6050, loss[loss=0.07073, simple_loss=0.1056, pruned_loss=0.01067, audio_tagging_loss=0.007241, over 14899.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08984, pruned_loss=0.01225, audio_tagging_loss=0.008499, over 3073384.66 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 11:40:45,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3487240.0, ans=0.1 2023-11-28 11:40:55,545 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.53 vs. limit=12.0 2023-11-28 11:40:58,493 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 523100 2023-11-28 11:41:01,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3487306.6666666665, ans=0.1 2023-11-28 11:41:03,129 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 11:41:10,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3487373.3333333335, ans=0.125 2023-11-28 11:41:25,704 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3487440.0, ans=0.0 2023-11-28 11:41:31,077 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 6100, loss[loss=0.08344, simple_loss=0.1141, pruned_loss=0.01719, audio_tagging_loss=0.009219, over 15221.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.09057, pruned_loss=0.01239, audio_tagging_loss=0.008458, over 3071537.92 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 11:41:32,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3487506.6666666665, ans=0.1 2023-11-28 11:41:34,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3487506.6666666665, ans=0.0 2023-11-28 11:41:43,532 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.526e+01 8.905e+01 9.501e+01 1.004e+02 1.216e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-28 11:41:51,912 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.63 vs. limit=15.0 2023-11-28 11:41:56,915 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 523150 2023-11-28 11:42:08,365 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3487706.6666666665, ans=0.125 2023-11-28 11:42:12,816 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3487706.6666666665, ans=0.125 2023-11-28 11:42:20,971 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.13 vs. limit=12.0 2023-11-28 11:42:30,247 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 6150, loss[loss=0.06285, simple_loss=0.08952, pruned_loss=0.0111, audio_tagging_loss=0.006989, over 15961.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.09055, pruned_loss=0.01235, audio_tagging_loss=0.008526, over 3063131.33 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 11:42:33,169 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.18 vs. limit=10.0 2023-11-28 11:42:38,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3487840.0, ans=0.2 2023-11-28 11:42:38,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3487840.0, ans=0.0 2023-11-28 11:42:39,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3487840.0, ans=0.1 2023-11-28 11:42:48,398 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.70 vs. limit=15.0 2023-11-28 11:42:56,299 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 523200 2023-11-28 11:42:59,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3487973.3333333335, ans=0.2 2023-11-28 11:43:01,707 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.58 vs. limit=10.0 2023-11-28 11:43:05,592 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3488040.0, ans=0.0 2023-11-28 11:43:13,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3488040.0, ans=0.0 2023-11-28 11:43:17,890 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3488106.6666666665, ans=0.04949747468305833 2023-11-28 11:43:25,095 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.02 vs. limit=22.5 2023-11-28 11:43:28,704 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 6200, loss[loss=0.06285, simple_loss=0.08336, pruned_loss=0.01295, audio_tagging_loss=0.008218, over 16464.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08963, pruned_loss=0.01202, audio_tagging_loss=0.008616, over 3057472.13 frames. ], batch size: 60, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 11:43:42,410 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.645e+01 8.740e+01 9.407e+01 1.006e+02 1.193e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-28 11:43:47,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3488240.0, ans=0.125 2023-11-28 11:43:56,016 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 523250 2023-11-28 11:43:59,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3488306.6666666665, ans=0.09899494936611666 2023-11-28 11:44:07,771 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.10 vs. limit=22.5 2023-11-28 11:44:23,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3488440.0, ans=0.09899494936611666 2023-11-28 11:44:25,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3488440.0, ans=0.125 2023-11-28 11:44:28,616 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 6250, loss[loss=0.0664, simple_loss=0.0901, pruned_loss=0.01156, audio_tagging_loss=0.00979, over 14801.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08939, pruned_loss=0.01195, audio_tagging_loss=0.008785, over 3055869.42 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 11:44:46,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3488573.3333333335, ans=0.125 2023-11-28 11:44:46,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3488573.3333333335, ans=0.2 2023-11-28 11:44:54,397 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 523300 2023-11-28 11:45:15,224 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.21 vs. limit=12.0 2023-11-28 11:45:27,602 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 6300, loss[loss=0.06566, simple_loss=0.08695, pruned_loss=0.01042, audio_tagging_loss=0.01176, over 15374.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.09026, pruned_loss=0.01223, audio_tagging_loss=0.008809, over 3049342.06 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 11:45:28,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3488840.0, ans=0.125 2023-11-28 11:45:36,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3488840.0, ans=0.125 2023-11-28 11:45:39,971 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.498e+01 8.790e+01 9.480e+01 1.019e+02 1.243e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-28 11:45:53,516 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 523350 2023-11-28 11:46:00,509 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3488973.3333333335, ans=0.1 2023-11-28 11:46:12,868 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.31 vs. limit=22.5 2023-11-28 11:46:22,215 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3489106.6666666665, ans=0.1 2023-11-28 11:46:25,335 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 6350, loss[loss=0.07323, simple_loss=0.1001, pruned_loss=0.01439, audio_tagging_loss=0.008782, over 15347.00 frames. ], tot_loss[loss=0.06663, simple_loss=0.09073, pruned_loss=0.01253, audio_tagging_loss=0.008737, over 3048784.17 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 11:46:25,665 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3489173.3333333335, ans=0.125 2023-11-28 11:46:41,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3489240.0, ans=0.125 2023-11-28 11:46:42,485 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.93 vs. limit=15.0 2023-11-28 11:46:46,581 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3489240.0, ans=0.025 2023-11-28 11:46:51,392 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 523400 2023-11-28 11:46:51,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3489306.6666666665, ans=0.0 2023-11-28 11:47:00,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3489373.3333333335, ans=0.125 2023-11-28 11:47:01,122 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.92 vs. limit=22.5 2023-11-28 11:47:12,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3489440.0, ans=0.1 2023-11-28 11:47:23,871 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 6400, loss[loss=0.07501, simple_loss=0.09699, pruned_loss=0.01726, audio_tagging_loss=0.009247, over 14690.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.09058, pruned_loss=0.01248, audio_tagging_loss=0.008815, over 3043675.30 frames. ], batch size: 53, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 11:47:36,652 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.480e+01 8.832e+01 9.473e+01 1.012e+02 1.860e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-28 11:47:49,026 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 523450 2023-11-28 11:47:57,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3489706.6666666665, ans=0.125 2023-11-28 11:47:59,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3489706.6666666665, ans=0.1 2023-11-28 11:48:18,260 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3489773.3333333335, ans=0.125 2023-11-28 11:48:22,438 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 6450, loss[loss=0.08039, simple_loss=0.1192, pruned_loss=0.01682, audio_tagging_loss=0.003988, over 15851.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.09134, pruned_loss=0.01245, audio_tagging_loss=0.008743, over 3047782.81 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 11:48:33,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3489906.6666666665, ans=0.2 2023-11-28 11:48:38,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3489906.6666666665, ans=0.0 2023-11-28 11:48:46,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3489973.3333333335, ans=0.0 2023-11-28 11:48:47,568 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 523500 2023-11-28 11:48:53,954 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 11:49:01,593 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3490040.0, ans=0.125 2023-11-28 11:49:02,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3490040.0, ans=0.1 2023-11-28 11:49:12,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3490106.6666666665, ans=0.125 2023-11-28 11:49:17,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3490106.6666666665, ans=0.1 2023-11-28 11:49:19,346 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 11:49:20,229 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 6500, loss[loss=0.07277, simple_loss=0.1017, pruned_loss=0.01143, audio_tagging_loss=0.01049, over 15015.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.09034, pruned_loss=0.01224, audio_tagging_loss=0.008744, over 3049238.87 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 11:49:26,107 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3490173.3333333335, ans=0.125 2023-11-28 11:49:33,078 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.765e+01 8.967e+01 9.507e+01 1.009e+02 1.264e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-28 11:49:46,648 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 523550 2023-11-28 11:49:53,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3490306.6666666665, ans=0.125 2023-11-28 11:50:07,462 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.55 vs. limit=10.0 2023-11-28 11:50:18,391 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 6550, loss[loss=0.05674, simple_loss=0.07712, pruned_loss=0.01055, audio_tagging_loss=0.007632, over 15169.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.09068, pruned_loss=0.01233, audio_tagging_loss=0.008618, over 3049214.51 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 11:50:18,984 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.21 vs. limit=15.0 2023-11-28 11:50:28,900 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=3490506.6666666665, ans=15.0 2023-11-28 11:50:32,086 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3490573.3333333335, ans=0.0 2023-11-28 11:50:39,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3490573.3333333335, ans=0.5 2023-11-28 11:50:41,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3490640.0, ans=0.0 2023-11-28 11:50:44,085 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 523600 2023-11-28 11:51:13,175 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-28 11:51:13,572 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.88 vs. limit=12.0 2023-11-28 11:51:14,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3490773.3333333335, ans=0.2 2023-11-28 11:51:17,297 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 6600, loss[loss=0.06046, simple_loss=0.08645, pruned_loss=0.008758, audio_tagging_loss=0.00848, over 15037.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08963, pruned_loss=0.01214, audio_tagging_loss=0.008566, over 3047240.99 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 11:51:30,532 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.303e+01 8.875e+01 9.605e+01 1.016e+02 1.315e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-28 11:51:37,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3490906.6666666665, ans=0.125 2023-11-28 11:51:41,956 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 523650 2023-11-28 11:51:43,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=3490973.3333333335, ans=0.5 2023-11-28 11:51:55,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3491040.0, ans=0.125 2023-11-28 11:52:01,159 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.24 vs. limit=15.0 2023-11-28 11:52:09,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3491106.6666666665, ans=0.125 2023-11-28 11:52:10,668 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 11:52:12,118 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.15 vs. limit=22.5 2023-11-28 11:52:13,394 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.80 vs. limit=12.0 2023-11-28 11:52:14,943 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 6650, loss[loss=0.0482, simple_loss=0.05901, pruned_loss=0.009032, audio_tagging_loss=0.009661, over 14418.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08922, pruned_loss=0.01216, audio_tagging_loss=0.008515, over 3049957.76 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 11:52:19,610 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3491173.3333333335, ans=0.2 2023-11-28 11:52:32,227 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 11:52:41,215 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 523700 2023-11-28 11:52:46,254 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3491306.6666666665, ans=0.125 2023-11-28 11:52:55,473 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.86 vs. limit=15.0 2023-11-28 11:53:03,406 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.42 vs. limit=15.0 2023-11-28 11:53:04,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3491440.0, ans=0.2 2023-11-28 11:53:05,319 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3491440.0, ans=0.1 2023-11-28 11:53:07,476 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3491440.0, ans=0.125 2023-11-28 11:53:13,450 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 6700, loss[loss=0.07664, simple_loss=0.1026, pruned_loss=0.01608, audio_tagging_loss=0.009265, over 14688.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.08989, pruned_loss=0.01232, audio_tagging_loss=0.00848, over 3047163.62 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 11:53:28,304 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.352e+01 8.754e+01 9.466e+01 1.016e+02 1.269e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-28 11:53:31,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3491573.3333333335, ans=0.2 2023-11-28 11:53:39,723 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 523750 2023-11-28 11:53:49,649 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3491706.6666666665, ans=0.125 2023-11-28 11:53:53,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3491706.6666666665, ans=0.125 2023-11-28 11:53:57,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3491706.6666666665, ans=0.125 2023-11-28 11:54:00,773 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.33 vs. limit=22.5 2023-11-28 11:54:08,018 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3491773.3333333335, ans=0.5 2023-11-28 11:54:12,334 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 6750, loss[loss=0.05673, simple_loss=0.0784, pruned_loss=0.009503, audio_tagging_loss=0.008025, over 14988.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.08968, pruned_loss=0.01221, audio_tagging_loss=0.008558, over 3044798.53 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 11:54:29,261 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3491906.6666666665, ans=0.125 2023-11-28 11:54:36,953 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 523800 2023-11-28 11:54:46,572 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.40 vs. limit=15.0 2023-11-28 11:54:52,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3492040.0, ans=0.0 2023-11-28 11:55:01,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3492106.6666666665, ans=0.125 2023-11-28 11:55:10,805 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 6800, loss[loss=0.05809, simple_loss=0.07354, pruned_loss=0.01098, audio_tagging_loss=0.01034, over 15267.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.08968, pruned_loss=0.01237, audio_tagging_loss=0.0085, over 3046184.18 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 11:55:12,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3492173.3333333335, ans=0.0 2023-11-28 11:55:16,596 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3492173.3333333335, ans=0.125 2023-11-28 11:55:24,147 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.501e+01 8.914e+01 9.606e+01 1.021e+02 1.348e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-28 11:55:31,998 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.11 vs. limit=22.5 2023-11-28 11:55:32,024 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.66 vs. limit=15.0 2023-11-28 11:55:35,907 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 523850 2023-11-28 11:55:56,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3492440.0, ans=0.2 2023-11-28 11:56:01,541 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3492440.0, ans=0.125 2023-11-28 11:56:09,088 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 6850, loss[loss=0.05745, simple_loss=0.08621, pruned_loss=0.00852, audio_tagging_loss=0.005827, over 15131.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.09001, pruned_loss=0.01244, audio_tagging_loss=0.008402, over 3047436.29 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 11:56:13,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3492506.6666666665, ans=0.125 2023-11-28 11:56:16,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3492506.6666666665, ans=0.125 2023-11-28 11:56:23,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3492573.3333333335, ans=0.1 2023-11-28 11:56:32,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3492640.0, ans=0.1 2023-11-28 11:56:35,872 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 523900 2023-11-28 11:56:38,259 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3492640.0, ans=0.1 2023-11-28 11:56:50,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3492706.6666666665, ans=0.0 2023-11-28 11:56:50,535 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.06 vs. limit=10.0 2023-11-28 11:56:53,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3492706.6666666665, ans=0.0 2023-11-28 11:57:07,909 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 6900, loss[loss=0.06912, simple_loss=0.08457, pruned_loss=0.01617, audio_tagging_loss=0.01067, over 14842.00 frames. ], tot_loss[loss=0.06523, simple_loss=0.08903, pruned_loss=0.01225, audio_tagging_loss=0.008467, over 3045748.02 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 11:57:23,410 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.679e+01 8.756e+01 9.577e+01 1.062e+02 1.292e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-28 11:57:33,320 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 523950 2023-11-28 11:57:57,188 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 11:58:00,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3493106.6666666665, ans=0.125 2023-11-28 11:58:03,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3493106.6666666665, ans=0.125 2023-11-28 11:58:06,566 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 6950, loss[loss=0.06003, simple_loss=0.08119, pruned_loss=0.01192, audio_tagging_loss=0.007515, over 15151.00 frames. ], tot_loss[loss=0.066, simple_loss=0.09022, pruned_loss=0.01245, audio_tagging_loss=0.008447, over 3046774.18 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 11:58:06,831 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 11:58:10,226 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3493173.3333333335, ans=0.125 2023-11-28 11:58:14,433 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3493173.3333333335, ans=0.0 2023-11-28 11:58:29,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3493306.6666666665, ans=0.125 2023-11-28 11:58:31,649 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 524000 2023-11-28 11:58:48,743 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3493373.3333333335, ans=0.125 2023-11-28 11:58:51,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3493373.3333333335, ans=0.125 2023-11-28 11:58:55,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3493440.0, ans=0.125 2023-11-28 11:58:58,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3493440.0, ans=0.0 2023-11-28 11:58:58,343 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3493440.0, ans=10.0 2023-11-28 11:59:01,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3493440.0, ans=0.0 2023-11-28 11:59:06,781 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 7000, loss[loss=0.04663, simple_loss=0.06681, pruned_loss=0.006039, audio_tagging_loss=0.007185, over 15631.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.0894, pruned_loss=0.01217, audio_tagging_loss=0.008638, over 3048903.48 frames. ], batch size: 60, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 11:59:13,241 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3493506.6666666665, ans=0.125 2023-11-28 11:59:14,319 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3493506.6666666665, ans=0.2 2023-11-28 11:59:21,773 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.539e+01 8.923e+01 9.384e+01 1.033e+02 1.328e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-28 11:59:32,810 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 524050 2023-11-28 11:59:36,045 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3493640.0, ans=0.125 2023-11-28 11:59:47,021 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3493706.6666666665, ans=0.125 2023-11-28 11:59:51,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3493706.6666666665, ans=0.0 2023-11-28 12:00:05,206 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 7050, loss[loss=0.07356, simple_loss=0.1045, pruned_loss=0.01286, audio_tagging_loss=0.008435, over 15517.00 frames. ], tot_loss[loss=0.0652, simple_loss=0.08899, pruned_loss=0.01199, audio_tagging_loss=0.008711, over 3042210.81 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:00:31,307 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 524100 2023-11-28 12:00:36,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3493973.3333333335, ans=0.0 2023-11-28 12:00:49,830 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3494040.0, ans=0.0 2023-11-28 12:00:57,561 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3494106.6666666665, ans=0.0 2023-11-28 12:01:03,906 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 7100, loss[loss=0.05362, simple_loss=0.06191, pruned_loss=0.009359, audio_tagging_loss=0.01331, over 14371.00 frames. ], tot_loss[loss=0.06485, simple_loss=0.088, pruned_loss=0.01195, audio_tagging_loss=0.008895, over 3040545.39 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:01:18,902 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.268e+01 8.879e+01 9.631e+01 1.062e+02 1.360e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-28 12:01:29,499 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 524150 2023-11-28 12:01:30,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3494306.6666666665, ans=0.125 2023-11-28 12:01:39,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3494373.3333333335, ans=0.125 2023-11-28 12:01:41,585 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.85 vs. limit=15.0 2023-11-28 12:01:53,443 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.32 vs. limit=12.0 2023-11-28 12:02:01,618 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 7150, loss[loss=0.06063, simple_loss=0.08792, pruned_loss=0.005911, audio_tagging_loss=0.01076, over 15730.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.08963, pruned_loss=0.01221, audio_tagging_loss=0.008961, over 3047715.13 frames. ], batch size: 62, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:02:02,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3494506.6666666665, ans=0.0 2023-11-28 12:02:08,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3494506.6666666665, ans=0.125 2023-11-28 12:02:08,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3494506.6666666665, ans=0.125 2023-11-28 12:02:10,883 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.77 vs. limit=15.0 2023-11-28 12:02:12,422 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3494573.3333333335, ans=0.5 2023-11-28 12:02:27,468 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 524200 2023-11-28 12:02:47,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3494773.3333333335, ans=0.2 2023-11-28 12:02:59,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3494840.0, ans=0.125 2023-11-28 12:03:00,041 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 7200, loss[loss=0.07479, simple_loss=0.1131, pruned_loss=0.01276, audio_tagging_loss=0.005502, over 14801.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.08965, pruned_loss=0.01209, audio_tagging_loss=0.008942, over 3049010.01 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:03:05,422 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3494840.0, ans=0.0 2023-11-28 12:03:09,559 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3494840.0, ans=0.0 2023-11-28 12:03:10,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3494906.6666666665, ans=0.125 2023-11-28 12:03:11,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3494906.6666666665, ans=0.2 2023-11-28 12:03:15,009 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.611e+01 8.816e+01 9.709e+01 1.018e+02 1.271e+02, threshold=1.942e+02, percent-clipped=0.0 2023-11-28 12:03:19,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3494906.6666666665, ans=0.125 2023-11-28 12:03:24,906 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3494973.3333333335, ans=0.1 2023-11-28 12:03:25,295 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.33 vs. limit=15.0 2023-11-28 12:03:25,929 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 524250 2023-11-28 12:03:57,756 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 7250, loss[loss=0.07517, simple_loss=0.1123, pruned_loss=0.01419, audio_tagging_loss=0.004848, over 15489.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.09025, pruned_loss=0.01221, audio_tagging_loss=0.008867, over 3051781.68 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:04:11,683 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3495240.0, ans=0.125 2023-11-28 12:04:23,258 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 524300 2023-11-28 12:04:24,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3495306.6666666665, ans=0.0 2023-11-28 12:04:38,993 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3495373.3333333335, ans=0.0 2023-11-28 12:04:40,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3495373.3333333335, ans=10.0 2023-11-28 12:04:52,344 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3495440.0, ans=0.2 2023-11-28 12:04:56,017 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 7300, loss[loss=0.06453, simple_loss=0.0875, pruned_loss=0.01286, audio_tagging_loss=0.007924, over 14333.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.09018, pruned_loss=0.0121, audio_tagging_loss=0.008818, over 3049251.42 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:05:12,053 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.876e+01 8.741e+01 9.294e+01 1.019e+02 1.260e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-28 12:05:21,880 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 524350 2023-11-28 12:05:30,555 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.55 vs. limit=5.0 2023-11-28 12:05:31,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3495706.6666666665, ans=0.0 2023-11-28 12:05:31,506 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.18 vs. limit=15.0 2023-11-28 12:05:54,095 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 7350, loss[loss=0.05555, simple_loss=0.0715, pruned_loss=0.01055, audio_tagging_loss=0.00924, over 15143.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.09045, pruned_loss=0.01214, audio_tagging_loss=0.0086, over 3053935.26 frames. ], batch size: 60, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:05:57,595 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3495840.0, ans=0.2 2023-11-28 12:06:11,086 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3495906.6666666665, ans=0.2 2023-11-28 12:06:12,080 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3495906.6666666665, ans=0.125 2023-11-28 12:06:12,430 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.61 vs. limit=10.0 2023-11-28 12:06:19,706 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 524400 2023-11-28 12:06:27,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3495973.3333333335, ans=0.125 2023-11-28 12:06:32,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3496040.0, ans=0.125 2023-11-28 12:06:35,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3496040.0, ans=0.0 2023-11-28 12:06:49,606 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3496106.6666666665, ans=0.2 2023-11-28 12:06:51,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3496106.6666666665, ans=0.125 2023-11-28 12:06:53,750 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 7400, loss[loss=0.06182, simple_loss=0.09254, pruned_loss=0.006439, audio_tagging_loss=0.009113, over 14979.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.09029, pruned_loss=0.0121, audio_tagging_loss=0.008502, over 3048833.05 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:07:05,347 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.76 vs. limit=22.5 2023-11-28 12:07:09,010 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.401e+01 8.769e+01 9.562e+01 1.042e+02 1.496e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-28 12:07:19,017 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 524450 2023-11-28 12:07:50,866 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 7450, loss[loss=0.08525, simple_loss=0.1131, pruned_loss=0.01902, audio_tagging_loss=0.009668, over 14987.00 frames. ], tot_loss[loss=0.0658, simple_loss=0.09035, pruned_loss=0.01215, audio_tagging_loss=0.008476, over 3041146.47 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:07:58,351 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3496506.6666666665, ans=0.0 2023-11-28 12:08:17,118 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 524500 2023-11-28 12:08:31,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3496706.6666666665, ans=0.125 2023-11-28 12:08:35,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3496773.3333333335, ans=0.0 2023-11-28 12:08:43,411 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3496773.3333333335, ans=0.1 2023-11-28 12:08:49,253 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 7500, loss[loss=0.0715, simple_loss=0.1006, pruned_loss=0.01403, audio_tagging_loss=0.007184, over 16061.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.08981, pruned_loss=0.01219, audio_tagging_loss=0.008528, over 3051165.18 frames. ], batch size: 60, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:09:05,329 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.587e+01 8.807e+01 9.534e+01 1.017e+02 1.454e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-28 12:09:06,782 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3496906.6666666665, ans=0.1 2023-11-28 12:09:11,124 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3496973.3333333335, ans=0.125 2023-11-28 12:09:14,159 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 524550 2023-11-28 12:09:45,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3497173.3333333335, ans=0.1 2023-11-28 12:09:46,908 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 7550, loss[loss=0.05999, simple_loss=0.07768, pruned_loss=0.01204, audio_tagging_loss=0.009118, over 16453.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.08947, pruned_loss=0.01219, audio_tagging_loss=0.008562, over 3056303.37 frames. ], batch size: 62, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:09:52,660 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3497173.3333333335, ans=0.125 2023-11-28 12:09:53,693 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3497173.3333333335, ans=0.2 2023-11-28 12:10:06,188 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.66 vs. limit=6.0 2023-11-28 12:10:11,089 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 524600 2023-11-28 12:10:32,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3497440.0, ans=0.0 2023-11-28 12:10:38,666 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3497440.0, ans=0.05 2023-11-28 12:10:41,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3497440.0, ans=0.1 2023-11-28 12:10:43,250 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3497506.6666666665, ans=0.0 2023-11-28 12:10:44,041 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 7600, loss[loss=0.06111, simple_loss=0.08486, pruned_loss=0.008904, audio_tagging_loss=0.009776, over 14751.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.08892, pruned_loss=0.01208, audio_tagging_loss=0.00858, over 3051765.81 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:10:45,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3497506.6666666665, ans=0.125 2023-11-28 12:10:47,908 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.29 vs. limit=12.0 2023-11-28 12:11:00,298 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.777e+01 8.865e+01 9.544e+01 1.025e+02 1.373e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-28 12:11:09,784 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 524650 2023-11-28 12:11:41,838 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 7650, loss[loss=0.06234, simple_loss=0.08567, pruned_loss=0.01046, audio_tagging_loss=0.009045, over 15076.00 frames. ], tot_loss[loss=0.06523, simple_loss=0.08902, pruned_loss=0.01217, audio_tagging_loss=0.008552, over 3049774.96 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:11:52,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3497906.6666666665, ans=0.125 2023-11-28 12:12:08,111 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 524700 2023-11-28 12:12:18,583 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.77 vs. limit=15.0 2023-11-28 12:12:19,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3498040.0, ans=0.0 2023-11-28 12:12:41,304 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 7700, loss[loss=0.08069, simple_loss=0.1076, pruned_loss=0.01825, audio_tagging_loss=0.00864, over 15019.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.09001, pruned_loss=0.01216, audio_tagging_loss=0.008498, over 3045772.71 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:12:45,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3498173.3333333335, ans=0.0 2023-11-28 12:12:49,655 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.20 vs. limit=6.0 2023-11-28 12:12:57,719 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.664e+01 8.890e+01 9.413e+01 1.018e+02 1.310e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-28 12:12:59,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.whiten.whitening_limit, batch_count=3498240.0, ans=15.0 2023-11-28 12:13:05,504 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 524750 2023-11-28 12:13:10,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3498306.6666666665, ans=0.125 2023-11-28 12:13:12,705 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3498306.6666666665, ans=0.1 2023-11-28 12:13:18,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3498373.3333333335, ans=0.1 2023-11-28 12:13:20,280 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.71 vs. limit=15.0 2023-11-28 12:13:36,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3498440.0, ans=0.125 2023-11-28 12:13:38,341 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 7750, loss[loss=0.06293, simple_loss=0.07795, pruned_loss=0.01477, audio_tagging_loss=0.009184, over 14745.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.09002, pruned_loss=0.01218, audio_tagging_loss=0.008566, over 3043783.14 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:13:46,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3498506.6666666665, ans=0.125 2023-11-28 12:13:57,374 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.06 vs. limit=15.0 2023-11-28 12:13:57,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3498573.3333333335, ans=0.0 2023-11-28 12:13:58,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn2.whiten.whitening_limit, batch_count=3498573.3333333335, ans=22.5 2023-11-28 12:14:03,114 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3498640.0, ans=0.09899494936611666 2023-11-28 12:14:03,977 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 524800 2023-11-28 12:14:17,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3498706.6666666665, ans=0.0 2023-11-28 12:14:19,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3498706.6666666665, ans=0.2 2023-11-28 12:14:35,913 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 7800, loss[loss=0.0526, simple_loss=0.0713, pruned_loss=0.007956, audio_tagging_loss=0.008995, over 14901.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.09019, pruned_loss=0.01211, audio_tagging_loss=0.008614, over 3044228.96 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:14:41,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3498840.0, ans=0.0 2023-11-28 12:14:54,660 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.957e+01 8.929e+01 9.549e+01 1.038e+02 1.507e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-28 12:14:56,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3498906.6666666665, ans=0.125 2023-11-28 12:15:02,438 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 524850 2023-11-28 12:15:09,845 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.55 vs. limit=12.0 2023-11-28 12:15:24,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3499106.6666666665, ans=0.125 2023-11-28 12:15:34,939 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 7850, loss[loss=0.04845, simple_loss=0.06325, pruned_loss=0.007227, audio_tagging_loss=0.009596, over 15290.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.09113, pruned_loss=0.01232, audio_tagging_loss=0.008621, over 3040252.68 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:15:35,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3499173.3333333335, ans=0.125 2023-11-28 12:15:56,815 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.13 vs. limit=15.0 2023-11-28 12:15:59,615 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 524900 2023-11-28 12:16:14,504 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3499373.3333333335, ans=0.125 2023-11-28 12:16:27,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3499440.0, ans=0.0 2023-11-28 12:16:32,395 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 7900, loss[loss=0.05532, simple_loss=0.0732, pruned_loss=0.01167, audio_tagging_loss=0.00704, over 15563.00 frames. ], tot_loss[loss=0.06624, simple_loss=0.09075, pruned_loss=0.01226, audio_tagging_loss=0.008608, over 3036816.20 frames. ], batch size: 60, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:16:32,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=3499506.6666666665, ans=0.025 2023-11-28 12:16:32,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3499506.6666666665, ans=0.1 2023-11-28 12:16:35,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3499506.6666666665, ans=0.04949747468305833 2023-11-28 12:16:35,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3499506.6666666665, ans=0.125 2023-11-28 12:16:38,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3499506.6666666665, ans=0.1 2023-11-28 12:16:39,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3499506.6666666665, ans=0.125 2023-11-28 12:16:49,086 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.612e+01 8.884e+01 9.612e+01 1.005e+02 1.246e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-28 12:16:55,286 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3499640.0, ans=0.0 2023-11-28 12:16:57,354 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 524950 2023-11-28 12:17:12,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3499706.6666666665, ans=0.0 2023-11-28 12:17:26,930 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3499773.3333333335, ans=0.125 2023-11-28 12:17:28,985 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 7950, loss[loss=0.07111, simple_loss=0.09511, pruned_loss=0.01281, audio_tagging_loss=0.01074, over 15213.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.09049, pruned_loss=0.01223, audio_tagging_loss=0.008809, over 3047335.65 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:17:50,049 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 12:17:55,459 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 525000 2023-11-28 12:18:15,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3500106.6666666665, ans=0.0 2023-11-28 12:18:27,118 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 8000, loss[loss=0.07269, simple_loss=0.09352, pruned_loss=0.01516, audio_tagging_loss=0.01077, over 17052.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.08954, pruned_loss=0.01208, audio_tagging_loss=0.008905, over 3045126.26 frames. ], batch size: 63, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:18:27,970 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.13 vs. limit=15.0 2023-11-28 12:18:34,784 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.71 vs. limit=15.0 2023-11-28 12:18:36,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3500173.3333333335, ans=0.1 2023-11-28 12:18:45,087 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.588e+01 8.932e+01 9.362e+01 1.010e+02 1.203e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-28 12:18:52,824 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 525050 2023-11-28 12:18:55,161 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3500306.6666666665, ans=0.0 2023-11-28 12:18:55,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3500306.6666666665, ans=0.125 2023-11-28 12:18:59,674 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 12:19:05,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3500373.3333333335, ans=0.1 2023-11-28 12:19:12,554 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3500440.0, ans=0.2 2023-11-28 12:19:17,350 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.30 vs. limit=10.0 2023-11-28 12:19:25,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3500506.6666666665, ans=0.125 2023-11-28 12:19:25,997 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 8050, loss[loss=0.07038, simple_loss=0.09133, pruned_loss=0.01555, audio_tagging_loss=0.00917, over 16263.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08902, pruned_loss=0.01208, audio_tagging_loss=0.00893, over 3045357.52 frames. ], batch size: 60, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:19:27,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3500506.6666666665, ans=0.0 2023-11-28 12:19:32,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3500506.6666666665, ans=0.0 2023-11-28 12:19:45,131 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.11 vs. limit=15.0 2023-11-28 12:19:50,924 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 525100 2023-11-28 12:19:51,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3500640.0, ans=0.125 2023-11-28 12:20:16,016 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.88 vs. limit=15.0 2023-11-28 12:20:21,171 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3500773.3333333335, ans=0.2 2023-11-28 12:20:23,013 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 8100, loss[loss=0.0628, simple_loss=0.08589, pruned_loss=0.01104, audio_tagging_loss=0.008815, over 16228.00 frames. ], tot_loss[loss=0.06596, simple_loss=0.08993, pruned_loss=0.01223, audio_tagging_loss=0.00877, over 3042615.16 frames. ], batch size: 60, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:20:32,666 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3500840.0, ans=0.05 2023-11-28 12:20:40,081 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.566e+01 8.823e+01 9.483e+01 1.016e+02 1.288e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-28 12:20:48,953 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 525150 2023-11-28 12:20:56,185 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3500973.3333333335, ans=0.1 2023-11-28 12:20:58,596 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3501040.0, ans=0.2 2023-11-28 12:20:59,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3501040.0, ans=0.0 2023-11-28 12:21:20,703 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 8150, loss[loss=0.06882, simple_loss=0.09188, pruned_loss=0.01384, audio_tagging_loss=0.009043, over 16430.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08953, pruned_loss=0.0121, audio_tagging_loss=0.008623, over 3041124.85 frames. ], batch size: 65, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:21:29,740 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.34 vs. limit=12.0 2023-11-28 12:21:37,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3501240.0, ans=0.125 2023-11-28 12:21:38,766 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3501240.0, ans=0.0 2023-11-28 12:21:41,178 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3501240.0, ans=0.125 2023-11-28 12:21:42,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3501240.0, ans=0.1 2023-11-28 12:21:46,482 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 525200 2023-11-28 12:22:04,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3501373.3333333335, ans=0.2 2023-11-28 12:22:17,413 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3501440.0, ans=0.125 2023-11-28 12:22:19,403 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 8200, loss[loss=0.07185, simple_loss=0.1081, pruned_loss=0.01234, audio_tagging_loss=0.005481, over 15716.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.08996, pruned_loss=0.01218, audio_tagging_loss=0.008455, over 3049624.32 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:22:21,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3501506.6666666665, ans=0.07 2023-11-28 12:22:25,433 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 12:22:26,760 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3501506.6666666665, ans=0.125 2023-11-28 12:22:36,394 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.784e+01 8.686e+01 9.334e+01 1.013e+02 1.382e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-28 12:22:40,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3501573.3333333335, ans=0.1 2023-11-28 12:22:44,797 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 525250 2023-11-28 12:22:50,606 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.39 vs. limit=15.0 2023-11-28 12:23:03,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3501706.6666666665, ans=0.0 2023-11-28 12:23:15,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3501840.0, ans=0.0 2023-11-28 12:23:16,946 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 8250, loss[loss=0.04749, simple_loss=0.05929, pruned_loss=0.007773, audio_tagging_loss=0.01007, over 14986.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.09036, pruned_loss=0.01218, audio_tagging_loss=0.008448, over 3039345.33 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:23:39,110 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3501973.3333333335, ans=0.125 2023-11-28 12:23:42,780 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 525300 2023-11-28 12:23:46,297 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3501973.3333333335, ans=0.125 2023-11-28 12:23:51,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3502040.0, ans=0.125 2023-11-28 12:23:55,704 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3502040.0, ans=0.125 2023-11-28 12:24:14,671 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 8300, loss[loss=0.08486, simple_loss=0.1174, pruned_loss=0.01962, audio_tagging_loss=0.006527, over 15198.00 frames. ], tot_loss[loss=0.066, simple_loss=0.09051, pruned_loss=0.01223, audio_tagging_loss=0.008513, over 3048198.34 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:24:32,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3502240.0, ans=0.0 2023-11-28 12:24:32,923 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.585e+01 8.723e+01 9.443e+01 1.022e+02 1.604e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-28 12:24:38,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3502306.6666666665, ans=0.125 2023-11-28 12:24:40,416 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 525350 2023-11-28 12:24:43,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3502306.6666666665, ans=0.125 2023-11-28 12:24:45,021 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3502306.6666666665, ans=0.1 2023-11-28 12:24:57,781 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3502373.3333333335, ans=0.125 2023-11-28 12:25:02,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3502440.0, ans=0.0 2023-11-28 12:25:07,185 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3502440.0, ans=0.2 2023-11-28 12:25:12,871 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 8350, loss[loss=0.0744, simple_loss=0.1086, pruned_loss=0.01529, audio_tagging_loss=0.004811, over 16017.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.09036, pruned_loss=0.01235, audio_tagging_loss=0.008559, over 3048184.65 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:25:13,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3502506.6666666665, ans=0.125 2023-11-28 12:25:17,024 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.66 vs. limit=15.0 2023-11-28 12:25:19,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3502506.6666666665, ans=0.125 2023-11-28 12:25:19,665 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3502506.6666666665, ans=0.0 2023-11-28 12:25:20,021 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.79 vs. limit=6.0 2023-11-28 12:25:37,659 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 525400 2023-11-28 12:25:43,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3502640.0, ans=0.125 2023-11-28 12:25:48,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3502706.6666666665, ans=0.125 2023-11-28 12:26:02,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3502773.3333333335, ans=0.2 2023-11-28 12:26:10,965 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 8400, loss[loss=0.06037, simple_loss=0.08314, pruned_loss=0.01126, audio_tagging_loss=0.007534, over 15068.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.0905, pruned_loss=0.0125, audio_tagging_loss=0.008478, over 3045875.81 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:26:13,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3502840.0, ans=0.125 2023-11-28 12:26:19,910 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3502840.0, ans=0.125 2023-11-28 12:26:29,022 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.479e+01 8.791e+01 9.357e+01 9.969e+01 1.892e+02, threshold=1.871e+02, percent-clipped=1.0 2023-11-28 12:26:30,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3502906.6666666665, ans=0.0 2023-11-28 12:26:36,652 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 525450 2023-11-28 12:26:43,871 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.43 vs. limit=22.5 2023-11-28 12:27:07,925 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 8450, loss[loss=0.05121, simple_loss=0.06489, pruned_loss=0.01008, audio_tagging_loss=0.008689, over 14956.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.09022, pruned_loss=0.01261, audio_tagging_loss=0.008571, over 3050413.14 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:27:08,754 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3503173.3333333335, ans=0.1 2023-11-28 12:27:29,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3503240.0, ans=0.125 2023-11-28 12:27:33,482 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 525500 2023-11-28 12:27:37,766 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.78 vs. limit=15.0 2023-11-28 12:27:47,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=3503373.3333333335, ans=15.0 2023-11-28 12:28:06,303 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 8500, loss[loss=0.0676, simple_loss=0.09733, pruned_loss=0.01126, audio_tagging_loss=0.007676, over 14956.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08914, pruned_loss=0.01238, audio_tagging_loss=0.008681, over 3051263.08 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:28:24,291 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.905e+01 8.935e+01 9.397e+01 1.006e+02 1.238e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-28 12:28:31,022 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 525550 2023-11-28 12:28:36,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3503640.0, ans=0.125 2023-11-28 12:28:37,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3503640.0, ans=0.125 2023-11-28 12:29:03,081 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 8550, loss[loss=0.05935, simple_loss=0.08583, pruned_loss=0.009661, audio_tagging_loss=0.006778, over 15275.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.08892, pruned_loss=0.01235, audio_tagging_loss=0.008653, over 3052443.47 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:29:04,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3503840.0, ans=0.0 2023-11-28 12:29:11,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3503840.0, ans=0.125 2023-11-28 12:29:24,291 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3503906.6666666665, ans=0.125 2023-11-28 12:29:28,515 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 525600 2023-11-28 12:30:00,689 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 8600, loss[loss=0.07133, simple_loss=0.1135, pruned_loss=0.007163, audio_tagging_loss=0.007409, over 14997.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.08987, pruned_loss=0.01242, audio_tagging_loss=0.008766, over 3050677.51 frames. ], batch size: 53, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:30:03,067 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.31 vs. limit=22.5 2023-11-28 12:30:19,762 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.707e+01 8.911e+01 9.426e+01 1.012e+02 1.309e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-28 12:30:24,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3504306.6666666665, ans=0.125 2023-11-28 12:30:26,582 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 525650 2023-11-28 12:30:31,282 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3504306.6666666665, ans=0.125 2023-11-28 12:30:52,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3504440.0, ans=0.125 2023-11-28 12:30:56,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3504440.0, ans=0.125 2023-11-28 12:30:59,386 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 8650, loss[loss=0.07822, simple_loss=0.1177, pruned_loss=0.01241, audio_tagging_loss=0.006975, over 15001.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.09025, pruned_loss=0.01247, audio_tagging_loss=0.008774, over 3048751.91 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:31:07,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3504506.6666666665, ans=0.125 2023-11-28 12:31:20,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3504640.0, ans=0.04949747468305833 2023-11-28 12:31:24,154 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 525700 2023-11-28 12:31:26,598 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3504640.0, ans=0.2 2023-11-28 12:31:56,499 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 8700, loss[loss=0.04651, simple_loss=0.06829, pruned_loss=0.003555, audio_tagging_loss=0.008813, over 14970.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.09015, pruned_loss=0.01242, audio_tagging_loss=0.008865, over 3042344.98 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:32:08,645 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3504906.6666666665, ans=0.125 2023-11-28 12:32:14,555 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.280e+01 8.928e+01 9.510e+01 1.026e+02 1.329e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-28 12:32:21,831 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 525750 2023-11-28 12:32:25,626 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.47 vs. limit=15.0 2023-11-28 12:32:30,391 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.14 vs. limit=15.0 2023-11-28 12:32:46,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3505106.6666666665, ans=0.2 2023-11-28 12:32:53,015 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 8750, loss[loss=0.05075, simple_loss=0.0694, pruned_loss=0.007096, audio_tagging_loss=0.008961, over 14074.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.09032, pruned_loss=0.01234, audio_tagging_loss=0.008929, over 3043426.71 frames. ], batch size: 53, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:32:56,580 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.02 vs. limit=15.0 2023-11-28 12:33:03,171 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3505173.3333333335, ans=0.0 2023-11-28 12:33:05,515 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 12:33:05,737 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.31 vs. limit=15.0 2023-11-28 12:33:19,022 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 525800 2023-11-28 12:33:42,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3505440.0, ans=0.125 2023-11-28 12:33:51,365 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 8800, loss[loss=0.05066, simple_loss=0.06065, pruned_loss=0.008445, audio_tagging_loss=0.01189, over 13727.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.09023, pruned_loss=0.01239, audio_tagging_loss=0.009087, over 3045631.02 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:33:59,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3505506.6666666665, ans=0.125 2023-11-28 12:34:09,431 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.660e+01 8.935e+01 9.643e+01 1.033e+02 1.238e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-28 12:34:16,076 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 525850 2023-11-28 12:34:24,222 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.29 vs. limit=15.0 2023-11-28 12:34:29,542 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 12:34:48,673 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 8850, loss[loss=0.05723, simple_loss=0.07365, pruned_loss=0.01076, audio_tagging_loss=0.009644, over 15105.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.09013, pruned_loss=0.01225, audio_tagging_loss=0.009035, over 3048550.11 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:35:04,100 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 12:35:06,488 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3505906.6666666665, ans=0.125 2023-11-28 12:35:07,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3505906.6666666665, ans=0.1 2023-11-28 12:35:10,653 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.04 vs. limit=15.0 2023-11-28 12:35:14,083 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 525900 2023-11-28 12:35:22,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3506040.0, ans=0.125 2023-11-28 12:35:24,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3506040.0, ans=0.125 2023-11-28 12:35:28,264 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.11 vs. limit=15.0 2023-11-28 12:35:29,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3506040.0, ans=0.125 2023-11-28 12:35:38,032 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.45 vs. limit=12.0 2023-11-28 12:35:45,389 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 8900, loss[loss=0.07358, simple_loss=0.09962, pruned_loss=0.01759, audio_tagging_loss=0.006182, over 15072.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.0899, pruned_loss=0.01223, audio_tagging_loss=0.00889, over 3049161.30 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:35:45,694 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3506173.3333333335, ans=0.0 2023-11-28 12:35:54,113 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3506173.3333333335, ans=0.125 2023-11-28 12:35:57,912 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3506240.0, ans=0.125 2023-11-28 12:36:04,590 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.479e+01 8.621e+01 9.140e+01 9.989e+01 1.171e+02, threshold=1.828e+02, percent-clipped=0.0 2023-11-28 12:36:11,337 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 525950 2023-11-28 12:36:19,260 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3506373.3333333335, ans=0.1 2023-11-28 12:36:22,228 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3506373.3333333335, ans=0.125 2023-11-28 12:36:26,091 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.37 vs. limit=15.0 2023-11-28 12:36:32,544 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.85 vs. limit=10.0 2023-11-28 12:36:42,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3506506.6666666665, ans=0.125 2023-11-28 12:36:43,089 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 8950, loss[loss=0.08104, simple_loss=0.1192, pruned_loss=0.01613, audio_tagging_loss=0.005301, over 16217.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.09069, pruned_loss=0.01237, audio_tagging_loss=0.008704, over 3060178.97 frames. ], batch size: 60, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:36:54,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3506573.3333333335, ans=0.125 2023-11-28 12:37:06,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3506640.0, ans=0.2 2023-11-28 12:37:07,802 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 526000 2023-11-28 12:37:17,463 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.16 vs. limit=15.0 2023-11-28 12:37:40,141 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.80 vs. limit=6.0 2023-11-28 12:37:40,513 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 9000, loss[loss=0.08747, simple_loss=0.119, pruned_loss=0.0193, audio_tagging_loss=0.008647, over 15493.00 frames. ], tot_loss[loss=0.06644, simple_loss=0.09083, pruned_loss=0.01243, audio_tagging_loss=0.008605, over 3061363.10 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:37:40,514 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-28 12:38:06,737 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.8243, 5.8843, 5.9198, 5.9088], device='cuda:1') 2023-11-28 12:38:15,233 INFO [train_asr.py:1267] (1/4) Epoch 44, validation: loss=0.05875, simple_loss=0.05057, pruned_loss=0.005344, audio_tagging_loss=0.02812, over 4681554.00 frames. 2023-11-28 12:38:15,233 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-28 12:38:35,941 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.641e+01 8.830e+01 9.439e+01 1.037e+02 1.240e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-28 12:38:41,582 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 526050 2023-11-28 12:38:54,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3507040.0, ans=0.125 2023-11-28 12:39:11,018 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.08 vs. limit=10.0 2023-11-28 12:39:13,641 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 9050, loss[loss=0.06432, simple_loss=0.0912, pruned_loss=0.01192, audio_tagging_loss=0.006803, over 15148.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.09056, pruned_loss=0.01231, audio_tagging_loss=0.008548, over 3059640.85 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:39:18,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3507173.3333333335, ans=0.125 2023-11-28 12:39:21,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3507173.3333333335, ans=0.125 2023-11-28 12:39:23,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3507173.3333333335, ans=0.125 2023-11-28 12:39:23,953 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.73 vs. limit=10.0 2023-11-28 12:39:26,245 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.14 vs. limit=15.0 2023-11-28 12:39:38,922 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 526100 2023-11-28 12:39:44,518 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3507306.6666666665, ans=0.2 2023-11-28 12:39:47,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3507373.3333333335, ans=0.125 2023-11-28 12:39:49,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3507373.3333333335, ans=0.125 2023-11-28 12:39:51,885 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.28 vs. limit=10.0 2023-11-28 12:39:56,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3507373.3333333335, ans=0.125 2023-11-28 12:40:11,479 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 9100, loss[loss=0.07934, simple_loss=0.1146, pruned_loss=0.01518, audio_tagging_loss=0.006862, over 15423.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.09114, pruned_loss=0.01236, audio_tagging_loss=0.008486, over 3054569.04 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:40:17,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3507506.6666666665, ans=0.125 2023-11-28 12:40:27,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3507573.3333333335, ans=0.2 2023-11-28 12:40:30,767 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.177e+01 9.014e+01 9.601e+01 1.029e+02 1.341e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-28 12:40:36,363 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 526150 2023-11-28 12:40:42,660 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3507640.0, ans=0.125 2023-11-28 12:40:43,045 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.91 vs. limit=22.5 2023-11-28 12:40:50,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3507706.6666666665, ans=0.125 2023-11-28 12:40:54,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3507706.6666666665, ans=0.125 2023-11-28 12:40:54,743 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.11 vs. limit=15.0 2023-11-28 12:41:08,324 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 9150, loss[loss=0.06304, simple_loss=0.09527, pruned_loss=0.009631, audio_tagging_loss=0.005772, over 15706.00 frames. ], tot_loss[loss=0.06697, simple_loss=0.0921, pruned_loss=0.01256, audio_tagging_loss=0.008361, over 3056489.31 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:41:09,103 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.18 vs. limit=15.0 2023-11-28 12:41:09,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3507840.0, ans=0.0 2023-11-28 12:41:18,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3507906.6666666665, ans=0.125 2023-11-28 12:41:26,616 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.69 vs. limit=15.0 2023-11-28 12:41:34,213 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 526200 2023-11-28 12:41:51,109 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3508040.0, ans=0.0 2023-11-28 12:42:05,858 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 9200, loss[loss=0.04842, simple_loss=0.06656, pruned_loss=0.006117, audio_tagging_loss=0.009022, over 14859.00 frames. ], tot_loss[loss=0.0667, simple_loss=0.09169, pruned_loss=0.01251, audio_tagging_loss=0.008349, over 3054276.52 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:42:22,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3508240.0, ans=0.5 2023-11-28 12:42:23,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3508240.0, ans=0.125 2023-11-28 12:42:25,533 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.046e+01 8.609e+01 9.408e+01 1.003e+02 1.431e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-28 12:42:31,021 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 526250 2023-11-28 12:42:33,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3508306.6666666665, ans=0.0 2023-11-28 12:42:43,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3508373.3333333335, ans=0.1 2023-11-28 12:42:45,003 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3508373.3333333335, ans=0.125 2023-11-28 12:42:49,818 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.22 vs. limit=15.0 2023-11-28 12:42:56,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3508440.0, ans=0.0 2023-11-28 12:43:02,875 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 9250, loss[loss=0.07061, simple_loss=0.1017, pruned_loss=0.01199, audio_tagging_loss=0.007752, over 15193.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.09016, pruned_loss=0.01228, audio_tagging_loss=0.008381, over 3058759.35 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:43:13,681 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.79 vs. limit=8.0 2023-11-28 12:43:18,592 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 12:43:19,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3508573.3333333335, ans=0.07 2023-11-28 12:43:27,723 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 526300 2023-11-28 12:43:43,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2.whitening_limit, batch_count=3508706.6666666665, ans=15.0 2023-11-28 12:43:54,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3508773.3333333335, ans=0.0 2023-11-28 12:43:59,853 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 9300, loss[loss=0.07135, simple_loss=0.08881, pruned_loss=0.0184, audio_tagging_loss=0.008547, over 14344.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.08957, pruned_loss=0.01229, audio_tagging_loss=0.008481, over 3058121.95 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:44:11,814 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3508906.6666666665, ans=0.125 2023-11-28 12:44:13,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3508906.6666666665, ans=0.2 2023-11-28 12:44:19,098 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.514e+01 9.008e+01 9.881e+01 1.066e+02 1.464e+02, threshold=1.976e+02, percent-clipped=0.0 2023-11-28 12:44:25,833 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 526350 2023-11-28 12:44:42,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3509040.0, ans=0.125 2023-11-28 12:44:47,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3509106.6666666665, ans=0.1 2023-11-28 12:44:57,016 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 9350, loss[loss=0.06198, simple_loss=0.08594, pruned_loss=0.009698, audio_tagging_loss=0.009316, over 14805.00 frames. ], tot_loss[loss=0.06528, simple_loss=0.08933, pruned_loss=0.01214, audio_tagging_loss=0.008477, over 3051755.60 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:45:08,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3509240.0, ans=0.0 2023-11-28 12:45:21,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3509306.6666666665, ans=0.0 2023-11-28 12:45:22,267 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 526400 2023-11-28 12:45:22,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3509306.6666666665, ans=0.025 2023-11-28 12:45:33,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3509373.3333333335, ans=0.125 2023-11-28 12:45:42,294 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.83 vs. limit=15.0 2023-11-28 12:45:55,563 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 9400, loss[loss=0.05819, simple_loss=0.08028, pruned_loss=0.0115, audio_tagging_loss=0.006555, over 14710.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.08963, pruned_loss=0.01235, audio_tagging_loss=0.008596, over 3052571.02 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:45:59,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3509506.6666666665, ans=0.125 2023-11-28 12:45:59,485 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.68 vs. limit=6.0 2023-11-28 12:46:01,204 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3509506.6666666665, ans=0.125 2023-11-28 12:46:14,167 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.155e+01 8.941e+01 9.559e+01 9.955e+01 1.222e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-28 12:46:20,450 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 526450 2023-11-28 12:46:52,489 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 9450, loss[loss=0.05085, simple_loss=0.06368, pruned_loss=0.00701, audio_tagging_loss=0.012, over 17110.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.08893, pruned_loss=0.01224, audio_tagging_loss=0.00878, over 3053451.45 frames. ], batch size: 65, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:46:55,900 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 12:46:58,215 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3509840.0, ans=0.1 2023-11-28 12:47:01,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3509840.0, ans=0.0 2023-11-28 12:47:03,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3509906.6666666665, ans=0.0 2023-11-28 12:47:14,494 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3509973.3333333335, ans=0.1 2023-11-28 12:47:18,069 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 526500 2023-11-28 12:47:21,554 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3509973.3333333335, ans=0.1 2023-11-28 12:47:36,382 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3510040.0, ans=0.0 2023-11-28 12:47:39,967 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.52 vs. limit=15.0 2023-11-28 12:47:50,097 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 9500, loss[loss=0.0747, simple_loss=0.1123, pruned_loss=0.0111, audio_tagging_loss=0.007438, over 16802.00 frames. ], tot_loss[loss=0.06523, simple_loss=0.08869, pruned_loss=0.0121, audio_tagging_loss=0.008788, over 3049465.22 frames. ], batch size: 61, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:48:09,640 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3510240.0, ans=0.0 2023-11-28 12:48:10,362 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.516e+01 8.845e+01 9.672e+01 1.033e+02 1.277e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-28 12:48:15,951 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 526550 2023-11-28 12:48:19,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3510306.6666666665, ans=0.2 2023-11-28 12:48:37,595 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3510440.0, ans=0.0 2023-11-28 12:48:48,233 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 9550, loss[loss=0.06298, simple_loss=0.09184, pruned_loss=0.01054, audio_tagging_loss=0.006527, over 15197.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.08973, pruned_loss=0.0122, audio_tagging_loss=0.008795, over 3055527.57 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:49:07,703 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_na.min_abs, batch_count=3510573.3333333335, ans=0.02 2023-11-28 12:49:12,674 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3510640.0, ans=0.5 2023-11-28 12:49:13,684 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 526600 2023-11-28 12:49:13,768 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3510640.0, ans=0.125 2023-11-28 12:49:17,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3510640.0, ans=0.0 2023-11-28 12:49:30,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3510706.6666666665, ans=0.05 2023-11-28 12:49:36,165 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3510773.3333333335, ans=0.2 2023-11-28 12:49:46,375 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 9600, loss[loss=0.05282, simple_loss=0.06934, pruned_loss=0.009521, audio_tagging_loss=0.008627, over 14579.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.09078, pruned_loss=0.01236, audio_tagging_loss=0.008794, over 3056861.95 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:49:53,611 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.58 vs. limit=15.0 2023-11-28 12:49:54,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3510840.0, ans=0.125 2023-11-28 12:49:57,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3510906.6666666665, ans=0.1 2023-11-28 12:50:06,850 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.314e+01 8.780e+01 9.691e+01 1.026e+02 1.293e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-28 12:50:08,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3510973.3333333335, ans=0.0 2023-11-28 12:50:11,858 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 526650 2023-11-28 12:50:12,242 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.65 vs. limit=15.0 2023-11-28 12:50:21,498 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3511040.0, ans=0.125 2023-11-28 12:50:21,614 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.61 vs. limit=10.0 2023-11-28 12:50:35,212 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3511106.6666666665, ans=0.0 2023-11-28 12:50:44,463 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 9650, loss[loss=0.05697, simple_loss=0.07453, pruned_loss=0.01041, audio_tagging_loss=0.009299, over 14928.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.09051, pruned_loss=0.01227, audio_tagging_loss=0.008822, over 3061728.94 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:50:59,937 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3511240.0, ans=0.5 2023-11-28 12:51:02,144 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3511240.0, ans=0.2 2023-11-28 12:51:09,390 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 526700 2023-11-28 12:51:31,919 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3511440.0, ans=0.0 2023-11-28 12:51:42,586 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 9700, loss[loss=0.06412, simple_loss=0.08955, pruned_loss=0.009585, audio_tagging_loss=0.009762, over 14374.00 frames. ], tot_loss[loss=0.0658, simple_loss=0.0898, pruned_loss=0.01215, audio_tagging_loss=0.008754, over 3059355.28 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:52:01,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3511573.3333333335, ans=0.0 2023-11-28 12:52:03,076 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.922e+01 8.945e+01 9.456e+01 1.030e+02 1.271e+02, threshold=1.891e+02, percent-clipped=0.0 2023-11-28 12:52:08,181 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 526750 2023-11-28 12:52:08,359 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3511640.0, ans=0.125 2023-11-28 12:52:30,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3511773.3333333335, ans=0.1 2023-11-28 12:52:39,624 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 9750, loss[loss=0.06093, simple_loss=0.08873, pruned_loss=0.008937, audio_tagging_loss=0.007626, over 15522.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08939, pruned_loss=0.01213, audio_tagging_loss=0.008665, over 3053880.94 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:52:43,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3511840.0, ans=0.125 2023-11-28 12:52:52,171 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3511906.6666666665, ans=0.125 2023-11-28 12:52:53,296 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3511906.6666666665, ans=0.2 2023-11-28 12:52:59,694 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3511906.6666666665, ans=0.035 2023-11-28 12:53:00,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3511906.6666666665, ans=0.125 2023-11-28 12:53:00,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3511906.6666666665, ans=0.0 2023-11-28 12:53:01,283 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.21 vs. limit=6.0 2023-11-28 12:53:05,024 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 526800 2023-11-28 12:53:21,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3512040.0, ans=0.0 2023-11-28 12:53:31,110 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3512106.6666666665, ans=0.125 2023-11-28 12:53:37,925 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 9800, loss[loss=0.06867, simple_loss=0.09009, pruned_loss=0.01286, audio_tagging_loss=0.01076, over 15266.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08891, pruned_loss=0.01209, audio_tagging_loss=0.008691, over 3051237.18 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:53:42,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3512173.3333333335, ans=0.0 2023-11-28 12:53:56,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=3512240.0, ans=0.1 2023-11-28 12:53:56,979 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.19 vs. limit=15.0 2023-11-28 12:53:58,293 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.194e+01 8.970e+01 9.668e+01 1.037e+02 1.358e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-28 12:54:02,874 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 526850 2023-11-28 12:54:11,339 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.04 vs. limit=15.0 2023-11-28 12:54:17,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3512373.3333333335, ans=0.1 2023-11-28 12:54:27,038 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.47 vs. limit=15.0 2023-11-28 12:54:31,113 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3512440.0, ans=0.2 2023-11-28 12:54:33,067 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 12:54:35,750 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 9850, loss[loss=0.06256, simple_loss=0.085, pruned_loss=0.01264, audio_tagging_loss=0.007423, over 16475.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08959, pruned_loss=0.01208, audio_tagging_loss=0.008706, over 3055243.62 frames. ], batch size: 61, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:54:35,905 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3512506.6666666665, ans=0.0 2023-11-28 12:54:37,807 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.02 vs. limit=15.0 2023-11-28 12:54:42,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3512506.6666666665, ans=0.2 2023-11-28 12:54:44,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3512506.6666666665, ans=0.0 2023-11-28 12:55:00,172 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3512640.0, ans=0.125 2023-11-28 12:55:01,023 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 526900 2023-11-28 12:55:01,516 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.76 vs. limit=12.0 2023-11-28 12:55:06,454 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.94 vs. limit=12.0 2023-11-28 12:55:22,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3512773.3333333335, ans=0.125 2023-11-28 12:55:24,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3512773.3333333335, ans=0.125 2023-11-28 12:55:33,305 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 9900, loss[loss=0.05779, simple_loss=0.07197, pruned_loss=0.01293, audio_tagging_loss=0.008877, over 13843.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.0899, pruned_loss=0.01209, audio_tagging_loss=0.008675, over 3051187.52 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:55:33,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3512840.0, ans=0.125 2023-11-28 12:55:36,050 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.42 vs. limit=15.0 2023-11-28 12:55:55,302 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.083e+01 9.239e+01 9.931e+01 1.065e+02 1.438e+02, threshold=1.986e+02, percent-clipped=0.0 2023-11-28 12:55:58,692 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 526950 2023-11-28 12:55:59,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3512973.3333333335, ans=0.125 2023-11-28 12:56:11,338 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.53 vs. limit=10.0 2023-11-28 12:56:11,396 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.03 vs. limit=15.0 2023-11-28 12:56:30,665 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.03 vs. limit=15.0 2023-11-28 12:56:31,234 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 9950, loss[loss=0.0507, simple_loss=0.06707, pruned_loss=0.006758, audio_tagging_loss=0.01041, over 15804.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08968, pruned_loss=0.0121, audio_tagging_loss=0.008609, over 3052597.66 frames. ], batch size: 61, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:56:40,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3513173.3333333335, ans=0.07 2023-11-28 12:56:56,781 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 527000 2023-11-28 12:57:13,629 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3513373.3333333335, ans=0.0 2023-11-28 12:57:16,018 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3513373.3333333335, ans=0.125 2023-11-28 12:57:16,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3513373.3333333335, ans=0.125 2023-11-28 12:57:18,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3513440.0, ans=0.0 2023-11-28 12:57:29,193 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 10000, loss[loss=0.07318, simple_loss=0.09763, pruned_loss=0.01475, audio_tagging_loss=0.009611, over 16053.00 frames. ], tot_loss[loss=0.06498, simple_loss=0.08876, pruned_loss=0.01202, audio_tagging_loss=0.008575, over 3054588.75 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:57:33,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3513506.6666666665, ans=0.125 2023-11-28 12:57:43,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3513573.3333333335, ans=0.0 2023-11-28 12:57:48,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3513573.3333333335, ans=0.125 2023-11-28 12:57:50,547 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.311e+01 8.894e+01 9.636e+01 1.033e+02 1.186e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-28 12:57:53,888 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 527050 2023-11-28 12:57:59,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3513640.0, ans=0.125 2023-11-28 12:58:02,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3513706.6666666665, ans=0.125 2023-11-28 12:58:14,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3513773.3333333335, ans=0.125 2023-11-28 12:58:17,592 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3513773.3333333335, ans=0.125 2023-11-28 12:58:22,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3513773.3333333335, ans=0.5 2023-11-28 12:58:26,163 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 10050, loss[loss=0.08615, simple_loss=0.1202, pruned_loss=0.01983, audio_tagging_loss=0.006203, over 14903.00 frames. ], tot_loss[loss=0.06488, simple_loss=0.08884, pruned_loss=0.01193, audio_tagging_loss=0.008537, over 3053488.54 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:58:31,539 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.53 vs. limit=8.0 2023-11-28 12:58:38,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3513906.6666666665, ans=0.025 2023-11-28 12:58:48,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3513973.3333333335, ans=0.04949747468305833 2023-11-28 12:58:49,895 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.12 vs. limit=6.0 2023-11-28 12:58:51,688 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 527100 2023-11-28 12:59:02,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3514040.0, ans=0.1 2023-11-28 12:59:15,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3514106.6666666665, ans=10.0 2023-11-28 12:59:18,683 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3514106.6666666665, ans=0.125 2023-11-28 12:59:22,923 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 10100, loss[loss=0.06843, simple_loss=0.1009, pruned_loss=0.01063, audio_tagging_loss=0.007322, over 14773.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08926, pruned_loss=0.01203, audio_tagging_loss=0.008583, over 3053691.33 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:59:39,378 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.05 vs. limit=15.0 2023-11-28 12:59:46,044 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.722e+01 8.720e+01 9.623e+01 1.026e+02 1.280e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-28 12:59:47,361 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3514306.6666666665, ans=0.1 2023-11-28 12:59:49,354 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 527150 2023-11-28 12:59:56,187 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 13:00:03,781 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3514373.3333333335, ans=0.1 2023-11-28 13:00:14,192 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 13:00:21,328 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 10150, loss[loss=0.07309, simple_loss=0.09379, pruned_loss=0.01395, audio_tagging_loss=0.01225, over 16299.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.08889, pruned_loss=0.01203, audio_tagging_loss=0.008654, over 3053857.06 frames. ], batch size: 61, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:00:24,287 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.65 vs. limit=15.0 2023-11-28 13:00:27,672 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3514506.6666666665, ans=0.0 2023-11-28 13:00:30,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3514506.6666666665, ans=0.1 2023-11-28 13:00:36,853 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3514573.3333333335, ans=0.1 2023-11-28 13:00:46,709 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 527200 2023-11-28 13:00:47,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3514640.0, ans=0.0 2023-11-28 13:00:53,069 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 13:00:55,792 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.75 vs. limit=15.0 2023-11-28 13:00:57,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3514706.6666666665, ans=0.125 2023-11-28 13:01:17,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3514773.3333333335, ans=0.125 2023-11-28 13:01:19,641 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 10200, loss[loss=0.0609, simple_loss=0.08061, pruned_loss=0.01202, audio_tagging_loss=0.008567, over 15248.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.08874, pruned_loss=0.01183, audio_tagging_loss=0.00871, over 3064594.58 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:01:27,649 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3514840.0, ans=0.0 2023-11-28 13:01:31,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3514906.6666666665, ans=0.04949747468305833 2023-11-28 13:01:35,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3514906.6666666665, ans=0.125 2023-11-28 13:01:37,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3514906.6666666665, ans=0.125 2023-11-28 13:01:41,254 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.267e+01 8.683e+01 9.423e+01 1.021e+02 1.647e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-28 13:01:44,666 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 527250 2023-11-28 13:01:45,666 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 13:01:55,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3515040.0, ans=0.0 2023-11-28 13:02:16,831 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 10250, loss[loss=0.06323, simple_loss=0.08675, pruned_loss=0.01113, audio_tagging_loss=0.008722, over 15294.00 frames. ], tot_loss[loss=0.06484, simple_loss=0.0885, pruned_loss=0.01177, audio_tagging_loss=0.008823, over 3059960.65 frames. ], batch size: 60, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:02:39,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3515306.6666666665, ans=0.125 2023-11-28 13:02:43,397 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 527300 2023-11-28 13:02:43,920 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.22 vs. limit=15.0 2023-11-28 13:03:09,790 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.95 vs. limit=15.0 2023-11-28 13:03:14,511 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 10300, loss[loss=0.05821, simple_loss=0.07994, pruned_loss=0.01173, audio_tagging_loss=0.006512, over 15392.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.08971, pruned_loss=0.01199, audio_tagging_loss=0.008751, over 3067022.43 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:03:21,379 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.66 vs. limit=22.5 2023-11-28 13:03:29,758 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3515573.3333333335, ans=0.0 2023-11-28 13:03:33,988 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3515573.3333333335, ans=0.0 2023-11-28 13:03:36,876 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.412e+01 8.855e+01 9.649e+01 1.050e+02 1.403e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-28 13:03:39,354 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3515640.0, ans=0.125 2023-11-28 13:03:40,210 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 527350 2023-11-28 13:03:43,593 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3515640.0, ans=0.125 2023-11-28 13:04:09,787 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3515773.3333333335, ans=0.125 2023-11-28 13:04:12,940 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 10350, loss[loss=0.05993, simple_loss=0.08206, pruned_loss=0.006953, audio_tagging_loss=0.01195, over 15784.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.08999, pruned_loss=0.01204, audio_tagging_loss=0.008833, over 3059040.70 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:04:16,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3515840.0, ans=0.125 2023-11-28 13:04:24,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3515906.6666666665, ans=0.125 2023-11-28 13:04:26,353 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3515906.6666666665, ans=0.125 2023-11-28 13:04:26,436 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3515906.6666666665, ans=0.125 2023-11-28 13:04:37,782 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 527400 2023-11-28 13:04:39,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3515973.3333333335, ans=0.125 2023-11-28 13:04:57,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3516040.0, ans=0.0 2023-11-28 13:05:10,266 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=10.86 vs. limit=15.0 2023-11-28 13:05:10,556 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 10400, loss[loss=0.06532, simple_loss=0.08799, pruned_loss=0.01269, audio_tagging_loss=0.008635, over 15574.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.0895, pruned_loss=0.01208, audio_tagging_loss=0.008878, over 3051360.58 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:05:27,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3516240.0, ans=0.1 2023-11-28 13:05:32,159 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.516e+01 9.002e+01 9.653e+01 1.031e+02 1.825e+02, threshold=1.931e+02, percent-clipped=0.0 2023-11-28 13:05:36,671 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 527450 2023-11-28 13:06:08,059 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 10450, loss[loss=0.06399, simple_loss=0.08927, pruned_loss=0.01111, audio_tagging_loss=0.008246, over 15843.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.09036, pruned_loss=0.01211, audio_tagging_loss=0.008758, over 3061410.70 frames. ], batch size: 61, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:06:24,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3516573.3333333335, ans=0.0 2023-11-28 13:06:33,673 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 527500 2023-11-28 13:06:38,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3516640.0, ans=0.125 2023-11-28 13:06:44,863 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.83 vs. limit=15.0 2023-11-28 13:06:45,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3516706.6666666665, ans=0.0 2023-11-28 13:06:57,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3516773.3333333335, ans=0.125 2023-11-28 13:07:05,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3516840.0, ans=0.0 2023-11-28 13:07:06,824 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 10500, loss[loss=0.0589, simple_loss=0.07859, pruned_loss=0.01041, audio_tagging_loss=0.009198, over 14753.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.0899, pruned_loss=0.012, audio_tagging_loss=0.008757, over 3059151.57 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:07:09,542 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.01 vs. limit=22.5 2023-11-28 13:07:27,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3516906.6666666665, ans=0.125 2023-11-28 13:07:28,310 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.723e+01 8.731e+01 9.359e+01 1.002e+02 1.371e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-28 13:07:31,671 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 527550 2023-11-28 13:07:32,162 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.18 vs. limit=22.5 2023-11-28 13:07:50,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3517040.0, ans=0.07 2023-11-28 13:07:50,668 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.92 vs. limit=15.0 2023-11-28 13:07:58,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3517106.6666666665, ans=0.125 2023-11-28 13:08:04,090 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 10550, loss[loss=0.05771, simple_loss=0.08431, pruned_loss=0.007286, audio_tagging_loss=0.00827, over 14789.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08961, pruned_loss=0.01204, audio_tagging_loss=0.008671, over 3047776.62 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:08:04,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3517173.3333333335, ans=0.2 2023-11-28 13:08:21,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=3517240.0, ans=0.025 2023-11-28 13:08:26,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3517306.6666666665, ans=0.0 2023-11-28 13:08:28,800 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 527600 2023-11-28 13:08:30,598 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3517306.6666666665, ans=0.2 2023-11-28 13:08:38,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3517373.3333333335, ans=0.1 2023-11-28 13:08:47,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3517373.3333333335, ans=0.125 2023-11-28 13:08:57,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3517440.0, ans=0.0 2023-11-28 13:08:57,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3517440.0, ans=0.0 2023-11-28 13:09:01,979 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 10600, loss[loss=0.07635, simple_loss=0.1046, pruned_loss=0.01714, audio_tagging_loss=0.006916, over 15136.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.08945, pruned_loss=0.01198, audio_tagging_loss=0.008554, over 3042751.28 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:09:20,948 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3517573.3333333335, ans=0.0 2023-11-28 13:09:24,552 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.206e+01 8.917e+01 9.595e+01 1.067e+02 1.545e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-28 13:09:27,987 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 527650 2023-11-28 13:09:55,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3517773.3333333335, ans=0.035 2023-11-28 13:10:00,407 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 10650, loss[loss=0.0695, simple_loss=0.09058, pruned_loss=0.01511, audio_tagging_loss=0.009105, over 14626.00 frames. ], tot_loss[loss=0.06528, simple_loss=0.08933, pruned_loss=0.012, audio_tagging_loss=0.008612, over 3055518.61 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:10:25,964 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 527700 2023-11-28 13:10:42,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3518040.0, ans=0.125 2023-11-28 13:10:57,982 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 10700, loss[loss=0.08837, simple_loss=0.1284, pruned_loss=0.01601, audio_tagging_loss=0.008174, over 15225.00 frames. ], tot_loss[loss=0.06522, simple_loss=0.08908, pruned_loss=0.01208, audio_tagging_loss=0.008593, over 3046777.75 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:11:04,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3518173.3333333335, ans=0.0 2023-11-28 13:11:17,254 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3518240.0, ans=0.0 2023-11-28 13:11:19,351 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.850e+01 8.935e+01 9.512e+01 1.031e+02 1.313e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-28 13:11:22,800 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 527750 2023-11-28 13:11:29,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3518306.6666666665, ans=0.5 2023-11-28 13:11:29,705 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.09 vs. limit=6.0 2023-11-28 13:11:55,947 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 10750, loss[loss=0.04256, simple_loss=0.05286, pruned_loss=0.005389, audio_tagging_loss=0.01075, over 14063.00 frames. ], tot_loss[loss=0.06507, simple_loss=0.08886, pruned_loss=0.01204, audio_tagging_loss=0.008598, over 3046266.34 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:12:08,348 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3518573.3333333335, ans=0.125 2023-11-28 13:12:21,177 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 527800 2023-11-28 13:12:38,291 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3518706.6666666665, ans=0.0 2023-11-28 13:12:38,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3518706.6666666665, ans=0.07 2023-11-28 13:12:53,904 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 10800, loss[loss=0.05092, simple_loss=0.06858, pruned_loss=0.007073, audio_tagging_loss=0.009555, over 14802.00 frames. ], tot_loss[loss=0.06492, simple_loss=0.08878, pruned_loss=0.01198, audio_tagging_loss=0.008554, over 3050588.91 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:13:02,260 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3518840.0, ans=0.125 2023-11-28 13:13:02,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3518840.0, ans=0.2 2023-11-28 13:13:15,947 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.616e+01 8.825e+01 9.488e+01 1.009e+02 1.262e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-28 13:13:19,988 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 527850 2023-11-28 13:13:34,354 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3519040.0, ans=0.1 2023-11-28 13:13:51,774 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 10850, loss[loss=0.07715, simple_loss=0.1011, pruned_loss=0.01657, audio_tagging_loss=0.01002, over 15959.00 frames. ], tot_loss[loss=0.0649, simple_loss=0.0887, pruned_loss=0.012, audio_tagging_loss=0.00855, over 3049538.23 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:14:17,164 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 527900 2023-11-28 13:14:50,120 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 10900, loss[loss=0.07568, simple_loss=0.1109, pruned_loss=0.01177, audio_tagging_loss=0.008439, over 15393.00 frames. ], tot_loss[loss=0.06404, simple_loss=0.08738, pruned_loss=0.01167, audio_tagging_loss=0.008672, over 3050007.94 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:14:51,281 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 13:15:11,871 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.107e+01 8.813e+01 9.594e+01 1.027e+02 1.260e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-28 13:15:15,304 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 527950 2023-11-28 13:15:30,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3519706.6666666665, ans=0.0 2023-11-28 13:15:33,277 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 13:15:45,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3519773.3333333335, ans=0.035 2023-11-28 13:15:47,463 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 10950, loss[loss=0.0784, simple_loss=0.1141, pruned_loss=0.01469, audio_tagging_loss=0.006642, over 16091.00 frames. ], tot_loss[loss=0.06445, simple_loss=0.08818, pruned_loss=0.0118, audio_tagging_loss=0.008562, over 3049644.58 frames. ], batch size: 60, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:15:49,291 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.26 vs. limit=15.0 2023-11-28 13:15:51,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3519840.0, ans=0.0 2023-11-28 13:15:56,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3519840.0, ans=0.1 2023-11-28 13:15:59,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3519906.6666666665, ans=0.0 2023-11-28 13:16:06,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3519906.6666666665, ans=0.2 2023-11-28 13:16:12,781 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 528000 2023-11-28 13:16:13,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=3519973.3333333335, ans=22.5 2023-11-28 13:16:25,095 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.14 vs. limit=22.5 2023-11-28 13:16:25,781 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3520040.0, ans=0.125 2023-11-28 13:16:45,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3520106.6666666665, ans=0.125 2023-11-28 13:16:47,191 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.31 vs. limit=15.0 2023-11-28 13:16:47,527 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 11000, loss[loss=0.0751, simple_loss=0.09854, pruned_loss=0.01522, audio_tagging_loss=0.0106, over 14870.00 frames. ], tot_loss[loss=0.06465, simple_loss=0.08823, pruned_loss=0.01184, audio_tagging_loss=0.008701, over 3047859.86 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:17:01,803 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 13:17:09,437 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.571e+01 9.312e+01 9.741e+01 1.066e+02 1.982e+02, threshold=1.948e+02, percent-clipped=1.0 2023-11-28 13:17:12,881 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 528050 2023-11-28 13:17:44,889 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 11050, loss[loss=0.07254, simple_loss=0.09712, pruned_loss=0.01331, audio_tagging_loss=0.01068, over 16625.00 frames. ], tot_loss[loss=0.06484, simple_loss=0.08841, pruned_loss=0.01189, audio_tagging_loss=0.00875, over 3041859.62 frames. ], batch size: 61, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 13:17:46,309 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3520506.6666666665, ans=0.07 2023-11-28 13:17:49,886 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.29 vs. limit=10.0 2023-11-28 13:17:56,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3520573.3333333335, ans=0.125 2023-11-28 13:18:10,204 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 528100 2023-11-28 13:18:11,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3520640.0, ans=0.125 2023-11-28 13:18:20,083 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3520706.6666666665, ans=0.2 2023-11-28 13:18:24,840 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.93 vs. limit=12.0 2023-11-28 13:18:41,363 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 11100, loss[loss=0.07169, simple_loss=0.09875, pruned_loss=0.01325, audio_tagging_loss=0.009067, over 15968.00 frames. ], tot_loss[loss=0.06507, simple_loss=0.08872, pruned_loss=0.01193, audio_tagging_loss=0.008786, over 3050962.88 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 13:18:51,615 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.76 vs. limit=6.0 2023-11-28 13:19:04,072 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.272e+01 8.992e+01 9.660e+01 1.067e+02 1.331e+02, threshold=1.932e+02, percent-clipped=0.0 2023-11-28 13:19:06,370 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 528150 2023-11-28 13:19:10,795 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.91 vs. limit=6.0 2023-11-28 13:19:39,153 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 11150, loss[loss=0.07356, simple_loss=0.1085, pruned_loss=0.0122, audio_tagging_loss=0.00712, over 16581.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08866, pruned_loss=0.01195, audio_tagging_loss=0.008991, over 3051202.56 frames. ], batch size: 60, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 13:19:48,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3521173.3333333335, ans=0.125 2023-11-28 13:20:04,787 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 528200 2023-11-28 13:20:06,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3521306.6666666665, ans=0.2 2023-11-28 13:20:32,819 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.82 vs. limit=15.0 2023-11-28 13:20:32,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=3521440.0, ans=15.0 2023-11-28 13:20:36,849 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 11200, loss[loss=0.08494, simple_loss=0.1111, pruned_loss=0.0186, audio_tagging_loss=0.01076, over 15117.00 frames. ], tot_loss[loss=0.06492, simple_loss=0.08772, pruned_loss=0.01197, audio_tagging_loss=0.009094, over 3045936.83 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:20:37,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3521506.6666666665, ans=0.0 2023-11-28 13:20:43,072 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3521506.6666666665, ans=0.07 2023-11-28 13:21:00,910 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.780e+01 8.925e+01 9.638e+01 1.050e+02 1.394e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-28 13:21:03,161 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 528250 2023-11-28 13:21:03,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3521640.0, ans=0.0 2023-11-28 13:21:13,286 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3521706.6666666665, ans=0.125 2023-11-28 13:21:23,529 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3521773.3333333335, ans=0.04949747468305833 2023-11-28 13:21:24,660 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3521773.3333333335, ans=0.125 2023-11-28 13:21:32,028 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3521773.3333333335, ans=0.0 2023-11-28 13:21:35,062 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 11250, loss[loss=0.05372, simple_loss=0.07405, pruned_loss=0.008715, audio_tagging_loss=0.007982, over 16604.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.08808, pruned_loss=0.01207, audio_tagging_loss=0.009051, over 3050814.81 frames. ], batch size: 61, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 13:22:00,245 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 528300 2023-11-28 13:22:09,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3522040.0, ans=0.0 2023-11-28 13:22:33,040 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 11300, loss[loss=0.05184, simple_loss=0.07218, pruned_loss=0.00685, audio_tagging_loss=0.008903, over 15098.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08878, pruned_loss=0.01205, audio_tagging_loss=0.008739, over 3049869.86 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 13:22:37,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3522173.3333333335, ans=0.07 2023-11-28 13:22:46,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3522240.0, ans=0.125 2023-11-28 13:22:48,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3522240.0, ans=0.0 2023-11-28 13:22:54,898 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3522306.6666666665, ans=0.125 2023-11-28 13:22:57,896 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.792e+01 8.854e+01 9.489e+01 1.005e+02 1.713e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-28 13:22:57,999 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 528350 2023-11-28 13:23:04,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3522306.6666666665, ans=0.0 2023-11-28 13:23:18,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3522440.0, ans=0.125 2023-11-28 13:23:21,720 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3522440.0, ans=0.125 2023-11-28 13:23:30,100 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 11350, loss[loss=0.06133, simple_loss=0.08705, pruned_loss=0.01193, audio_tagging_loss=0.005878, over 15243.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08945, pruned_loss=0.01214, audio_tagging_loss=0.008618, over 3058305.79 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 8.0 2023-11-28 13:23:31,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3522506.6666666665, ans=0.025 2023-11-28 13:23:32,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3522506.6666666665, ans=0.125 2023-11-28 13:23:40,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3522573.3333333335, ans=0.125 2023-11-28 13:23:49,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3522573.3333333335, ans=0.2 2023-11-28 13:23:56,627 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 528400 2023-11-28 13:23:58,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3522640.0, ans=0.125 2023-11-28 13:24:25,878 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.81 vs. limit=15.0 2023-11-28 13:24:28,360 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 11400, loss[loss=0.07312, simple_loss=0.1025, pruned_loss=0.01249, audio_tagging_loss=0.009358, over 15200.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.08982, pruned_loss=0.01226, audio_tagging_loss=0.008565, over 3057481.51 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 8.0 2023-11-28 13:24:54,039 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.279e+01 9.209e+01 9.932e+01 1.057e+02 3.089e+02, threshold=1.986e+02, percent-clipped=1.0 2023-11-28 13:24:54,142 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 528450 2023-11-28 13:25:03,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3523040.0, ans=0.0 2023-11-28 13:25:14,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3523106.6666666665, ans=0.1 2023-11-28 13:25:19,113 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.51 vs. limit=15.0 2023-11-28 13:25:27,089 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 11450, loss[loss=0.07579, simple_loss=0.1021, pruned_loss=0.01541, audio_tagging_loss=0.009351, over 15093.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.09002, pruned_loss=0.01222, audio_tagging_loss=0.0086, over 3052403.47 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 8.0 2023-11-28 13:25:38,518 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.80 vs. limit=15.0 2023-11-28 13:25:51,785 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 528500 2023-11-28 13:25:51,950 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3523306.6666666665, ans=0.1 2023-11-28 13:25:51,987 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-28 13:25:56,747 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.18 vs. limit=22.5 2023-11-28 13:26:00,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3523373.3333333335, ans=0.0 2023-11-28 13:26:09,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.whiten.whitening_limit, batch_count=3523373.3333333335, ans=12.0 2023-11-28 13:26:16,009 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.40 vs. limit=22.5 2023-11-28 13:26:16,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3523440.0, ans=0.0 2023-11-28 13:26:17,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3523440.0, ans=0.1 2023-11-28 13:26:24,139 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 11500, loss[loss=0.05938, simple_loss=0.07864, pruned_loss=0.0115, audio_tagging_loss=0.008557, over 15490.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08943, pruned_loss=0.01223, audio_tagging_loss=0.00855, over 3049210.78 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 8.0 2023-11-28 13:26:48,442 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.80 vs. limit=15.0 2023-11-28 13:26:50,285 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.439e+01 8.727e+01 9.422e+01 1.009e+02 1.518e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-28 13:26:50,406 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 528550 2023-11-28 13:26:50,628 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 13:27:04,196 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3523706.6666666665, ans=0.0 2023-11-28 13:27:08,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3523706.6666666665, ans=0.2 2023-11-28 13:27:22,071 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 11550, loss[loss=0.07686, simple_loss=0.1125, pruned_loss=0.01294, audio_tagging_loss=0.007662, over 15764.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.09003, pruned_loss=0.01231, audio_tagging_loss=0.008491, over 3042295.34 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 8.0 2023-11-28 13:27:25,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3523840.0, ans=0.0 2023-11-28 13:27:27,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3523840.0, ans=0.125 2023-11-28 13:27:28,070 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.75 vs. limit=22.5 2023-11-28 13:27:47,989 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 528600 2023-11-28 13:27:50,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3523973.3333333335, ans=0.035 2023-11-28 13:27:54,515 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.86 vs. limit=22.5 2023-11-28 13:27:58,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3524040.0, ans=0.125 2023-11-28 13:28:03,241 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 13:28:05,043 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.28 vs. limit=22.5 2023-11-28 13:28:15,505 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3524106.6666666665, ans=0.1 2023-11-28 13:28:16,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_na.min_abs, batch_count=3524106.6666666665, ans=0.02 2023-11-28 13:28:21,093 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 11600, loss[loss=0.09704, simple_loss=0.1366, pruned_loss=0.02262, audio_tagging_loss=0.006115, over 16711.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.09103, pruned_loss=0.01243, audio_tagging_loss=0.008448, over 3041185.84 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 13:28:29,602 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.05 vs. limit=10.0 2023-11-28 13:28:33,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3524240.0, ans=0.0 2023-11-28 13:28:45,928 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.331e+01 9.016e+01 9.615e+01 1.039e+02 1.434e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-28 13:28:46,032 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 528650 2023-11-28 13:28:50,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3524306.6666666665, ans=0.0 2023-11-28 13:29:09,782 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3524440.0, ans=0.125 2023-11-28 13:29:13,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3524440.0, ans=0.1 2023-11-28 13:29:18,256 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 11650, loss[loss=0.06843, simple_loss=0.09798, pruned_loss=0.01227, audio_tagging_loss=0.007164, over 16261.00 frames. ], tot_loss[loss=0.06675, simple_loss=0.09145, pruned_loss=0.01264, audio_tagging_loss=0.008379, over 3039310.97 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 13:29:39,007 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.36 vs. limit=12.0 2023-11-28 13:29:43,068 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 528700 2023-11-28 13:29:58,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3524706.6666666665, ans=0.125 2023-11-28 13:30:07,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3524773.3333333335, ans=0.125 2023-11-28 13:30:15,778 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 11700, loss[loss=0.06003, simple_loss=0.07714, pruned_loss=0.01098, audio_tagging_loss=0.01047, over 15850.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.09042, pruned_loss=0.01228, audio_tagging_loss=0.00842, over 3043867.23 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 13:30:28,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3524906.6666666665, ans=0.0 2023-11-28 13:30:36,338 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 13:30:41,690 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.708e+01 8.706e+01 9.225e+01 1.001e+02 1.364e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-28 13:30:41,793 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 528750 2023-11-28 13:30:47,788 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.87 vs. limit=15.0 2023-11-28 13:30:56,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3525040.0, ans=0.0 2023-11-28 13:31:05,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3525106.6666666665, ans=0.1 2023-11-28 13:31:12,750 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 11750, loss[loss=0.04587, simple_loss=0.0682, pruned_loss=0.005604, audio_tagging_loss=0.006162, over 15521.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.09014, pruned_loss=0.01226, audio_tagging_loss=0.008458, over 3053077.06 frames. ], batch size: 60, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 13:31:28,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3525240.0, ans=0.07 2023-11-28 13:31:33,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3525240.0, ans=0.125 2023-11-28 13:31:33,196 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3525240.0, ans=0.1 2023-11-28 13:31:39,237 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 528800 2023-11-28 13:32:11,776 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 11800, loss[loss=0.05593, simple_loss=0.06457, pruned_loss=0.01174, audio_tagging_loss=0.01191, over 13432.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.08965, pruned_loss=0.01213, audio_tagging_loss=0.008585, over 3046158.96 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 13:32:12,007 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3525506.6666666665, ans=0.125 2023-11-28 13:32:16,330 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3525506.6666666665, ans=0.2 2023-11-28 13:32:19,676 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3525506.6666666665, ans=0.09899494936611666 2023-11-28 13:32:20,788 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 13:32:36,406 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.895e+01 9.052e+01 9.553e+01 1.035e+02 2.670e+02, threshold=1.911e+02, percent-clipped=1.0 2023-11-28 13:32:36,513 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 528850 2023-11-28 13:32:47,725 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.82 vs. limit=10.0 2023-11-28 13:32:55,251 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.67 vs. limit=12.0 2023-11-28 13:32:59,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3525773.3333333335, ans=0.0 2023-11-28 13:33:06,221 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.40 vs. limit=15.0 2023-11-28 13:33:09,418 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 11850, loss[loss=0.06109, simple_loss=0.0885, pruned_loss=0.009845, audio_tagging_loss=0.006996, over 16025.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.09077, pruned_loss=0.0122, audio_tagging_loss=0.008644, over 3044647.74 frames. ], batch size: 60, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 13:33:17,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3525840.0, ans=0.0 2023-11-28 13:33:35,045 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 528900 2023-11-28 13:33:53,927 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.88 vs. limit=15.0 2023-11-28 13:34:06,365 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 11900, loss[loss=0.05526, simple_loss=0.07889, pruned_loss=0.008946, audio_tagging_loss=0.006869, over 15448.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.09046, pruned_loss=0.01228, audio_tagging_loss=0.008777, over 3047512.20 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 13:34:13,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3526173.3333333335, ans=0.125 2023-11-28 13:34:16,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3526173.3333333335, ans=0.025 2023-11-28 13:34:31,990 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.442e+01 8.911e+01 9.791e+01 1.051e+02 1.188e+02, threshold=1.958e+02, percent-clipped=0.0 2023-11-28 13:34:32,103 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 528950 2023-11-28 13:34:53,458 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.44 vs. limit=15.0 2023-11-28 13:35:02,889 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.40 vs. limit=10.0 2023-11-28 13:35:05,017 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 11950, loss[loss=0.06152, simple_loss=0.08418, pruned_loss=0.01221, audio_tagging_loss=0.007221, over 15959.00 frames. ], tot_loss[loss=0.0658, simple_loss=0.08979, pruned_loss=0.01211, audio_tagging_loss=0.008802, over 3053069.74 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 13:35:06,771 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.72 vs. limit=6.0 2023-11-28 13:35:29,952 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 529000 2023-11-28 13:35:34,921 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3526640.0, ans=0.0 2023-11-28 13:35:36,130 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 13:35:40,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3526706.6666666665, ans=0.2 2023-11-28 13:35:44,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3526706.6666666665, ans=0.2 2023-11-28 13:35:45,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=3526706.6666666665, ans=15.0 2023-11-28 13:35:46,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3526706.6666666665, ans=0.1 2023-11-28 13:35:50,890 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3526773.3333333335, ans=0.125 2023-11-28 13:35:57,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3526773.3333333335, ans=0.125 2023-11-28 13:36:02,065 INFO [train_asr.py:1235] (1/4) Epoch 44, batch 12000, loss[loss=0.06216, simple_loss=0.08318, pruned_loss=0.01358, audio_tagging_loss=0.006987, over 15520.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.09017, pruned_loss=0.01213, audio_tagging_loss=0.00884, over 3054053.49 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:36:02,066 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-28 13:36:37,269 INFO [train_asr.py:1267] (1/4) Epoch 44, validation: loss=0.05811, simple_loss=0.05058, pruned_loss=0.005337, audio_tagging_loss=0.02748, over 4681554.00 frames. 2023-11-28 13:36:37,270 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-28 13:36:38,523 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3526840.0, ans=0.2 2023-11-28 13:36:59,261 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3526973.3333333335, ans=0.2 2023-11-28 13:37:00,786 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 529050 2023-11-28 13:37:01,757 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.922e+01 8.972e+01 9.530e+01 1.015e+02 1.256e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-28 13:37:21,565 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 0, loss[loss=0.0738, simple_loss=0.0894, pruned_loss=0.00945, audio_tagging_loss=0.01965, over 15263.00 frames. ], tot_loss[loss=0.0738, simple_loss=0.0894, pruned_loss=0.00945, audio_tagging_loss=0.01965, over 15263.00 frames. ], batch size: 59, lr: 1.51e-03, grad_scale: 32.0 2023-11-28 13:37:21,566 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-28 13:37:56,009 INFO [train_asr.py:1267] (1/4) Epoch 45, validation: loss=0.05764, simple_loss=0.05062, pruned_loss=0.005372, audio_tagging_loss=0.02696, over 4681554.00 frames. 2023-11-28 13:37:56,009 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-28 13:37:56,471 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.82 vs. limit=6.0 2023-11-28 13:38:17,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3527080.0, ans=0.0 2023-11-28 13:38:17,475 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.18 vs. limit=15.0 2023-11-28 13:38:20,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3527146.6666666665, ans=0.1 2023-11-28 13:38:21,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3527146.6666666665, ans=0.125 2023-11-28 13:38:41,497 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.01 vs. limit=15.0 2023-11-28 13:38:46,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3527280.0, ans=0.125 2023-11-28 13:38:49,241 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 529100 2023-11-28 13:38:53,507 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 50, loss[loss=0.07005, simple_loss=0.08199, pruned_loss=0.01261, audio_tagging_loss=0.01645, over 15068.00 frames. ], tot_loss[loss=0.07453, simple_loss=0.08998, pruned_loss=0.01257, audio_tagging_loss=0.01697, over 694921.55 frames. ], batch size: 57, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:38:57,333 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.72 vs. limit=15.0 2023-11-28 13:39:06,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3527413.3333333335, ans=0.125 2023-11-28 13:39:11,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3527413.3333333335, ans=0.125 2023-11-28 13:39:36,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3527546.6666666665, ans=0.125 2023-11-28 13:39:41,259 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 13:39:45,623 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.31 vs. limit=6.0 2023-11-28 13:39:47,093 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 529150 2023-11-28 13:39:49,232 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.341e+01 9.943e+01 1.065e+02 1.140e+02 1.453e+02, threshold=2.129e+02, percent-clipped=0.0 2023-11-28 13:39:51,484 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 100, loss[loss=0.07959, simple_loss=0.09968, pruned_loss=0.01995, audio_tagging_loss=0.009799, over 15785.00 frames. ], tot_loss[loss=0.0716, simple_loss=0.08755, pruned_loss=0.01183, audio_tagging_loss=0.01599, over 1213883.33 frames. ], batch size: 58, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:39:53,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3527680.0, ans=0.0 2023-11-28 13:40:03,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3527746.6666666665, ans=0.1 2023-11-28 13:40:04,871 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 13:40:09,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3527746.6666666665, ans=0.125 2023-11-28 13:40:12,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3527746.6666666665, ans=0.0 2023-11-28 13:40:25,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3527880.0, ans=0.04949747468305833 2023-11-28 13:40:39,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3527946.6666666665, ans=0.125 2023-11-28 13:40:44,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3527946.6666666665, ans=0.0 2023-11-28 13:40:45,452 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 529200 2023-11-28 13:40:50,295 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 150, loss[loss=0.06428, simple_loss=0.08178, pruned_loss=0.01176, audio_tagging_loss=0.01163, over 15271.00 frames. ], tot_loss[loss=0.06956, simple_loss=0.08674, pruned_loss=0.01175, audio_tagging_loss=0.01445, over 1620297.08 frames. ], batch size: 61, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:40:50,876 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.63 vs. limit=6.0 2023-11-28 13:41:14,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3528146.6666666665, ans=0.05 2023-11-28 13:41:20,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3528146.6666666665, ans=0.2 2023-11-28 13:41:27,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3528213.3333333335, ans=0.125 2023-11-28 13:41:39,593 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3528280.0, ans=0.125 2023-11-28 13:41:43,798 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 529250 2023-11-28 13:41:46,530 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.087e+01 8.992e+01 9.963e+01 1.064e+02 1.457e+02, threshold=1.993e+02, percent-clipped=0.0 2023-11-28 13:41:48,764 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 200, loss[loss=0.0709, simple_loss=0.09452, pruned_loss=0.01247, audio_tagging_loss=0.01116, over 14658.00 frames. ], tot_loss[loss=0.06855, simple_loss=0.0875, pruned_loss=0.01192, audio_tagging_loss=0.01288, over 1925992.55 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:42:01,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3528413.3333333335, ans=0.0 2023-11-28 13:42:01,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3528413.3333333335, ans=0.1 2023-11-28 13:42:09,914 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.43 vs. limit=15.0 2023-11-28 13:42:23,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3528546.6666666665, ans=0.0 2023-11-28 13:42:32,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3528546.6666666665, ans=0.125 2023-11-28 13:42:42,236 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 529300 2023-11-28 13:42:46,536 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 250, loss[loss=0.08155, simple_loss=0.1064, pruned_loss=0.01904, audio_tagging_loss=0.009318, over 15064.00 frames. ], tot_loss[loss=0.06807, simple_loss=0.08884, pruned_loss=0.01206, audio_tagging_loss=0.01159, over 2178119.42 frames. ], batch size: 59, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:42:50,122 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3528680.0, ans=0.125 2023-11-28 13:42:53,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3528680.0, ans=0.0 2023-11-28 13:42:58,297 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3528746.6666666665, ans=0.2 2023-11-28 13:43:19,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3528813.3333333335, ans=0.125 2023-11-28 13:43:25,636 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 13:43:34,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3528946.6666666665, ans=0.125 2023-11-28 13:43:36,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3528946.6666666665, ans=0.1 2023-11-28 13:43:39,521 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 529350 2023-11-28 13:43:42,044 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.940e+01 8.918e+01 9.810e+01 1.066e+02 1.328e+02, threshold=1.962e+02, percent-clipped=0.0 2023-11-28 13:43:44,758 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 300, loss[loss=0.08898, simple_loss=0.1204, pruned_loss=0.02231, audio_tagging_loss=0.006464, over 15727.00 frames. ], tot_loss[loss=0.06804, simple_loss=0.09035, pruned_loss=0.0122, audio_tagging_loss=0.01067, over 2380388.26 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:43:59,458 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3529080.0, ans=0.125 2023-11-28 13:44:05,497 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 13:44:31,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3529280.0, ans=0.0 2023-11-28 13:44:35,672 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3529280.0, ans=0.125 2023-11-28 13:44:37,724 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 529400 2023-11-28 13:44:42,420 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 350, loss[loss=0.07555, simple_loss=0.1124, pruned_loss=0.01192, audio_tagging_loss=0.007408, over 15909.00 frames. ], tot_loss[loss=0.06734, simple_loss=0.09011, pruned_loss=0.0122, audio_tagging_loss=0.01009, over 2530888.40 frames. ], batch size: 57, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:44:49,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3529346.6666666665, ans=0.125 2023-11-28 13:45:04,888 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3529480.0, ans=0.0 2023-11-28 13:45:05,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3529480.0, ans=0.0 2023-11-28 13:45:10,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3529480.0, ans=0.09899494936611666 2023-11-28 13:45:18,284 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.13 vs. limit=22.5 2023-11-28 13:45:30,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3529613.3333333335, ans=0.0 2023-11-28 13:45:35,905 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 529450 2023-11-28 13:45:38,651 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.965e+01 9.086e+01 9.699e+01 1.038e+02 1.395e+02, threshold=1.940e+02, percent-clipped=0.0 2023-11-28 13:45:40,908 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 400, loss[loss=0.06513, simple_loss=0.0915, pruned_loss=0.01058, audio_tagging_loss=0.008803, over 15907.00 frames. ], tot_loss[loss=0.06744, simple_loss=0.09079, pruned_loss=0.01239, audio_tagging_loss=0.009648, over 2648252.42 frames. ], batch size: 59, lr: 1.51e-03, grad_scale: 32.0 2023-11-28 13:46:08,752 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3529813.3333333335, ans=0.0 2023-11-28 13:46:34,070 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 529500 2023-11-28 13:46:38,114 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3530013.3333333335, ans=0.125 2023-11-28 13:46:38,955 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 450, loss[loss=0.0711, simple_loss=0.1038, pruned_loss=0.01212, audio_tagging_loss=0.007075, over 15704.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.08912, pruned_loss=0.01208, audio_tagging_loss=0.009514, over 2728111.26 frames. ], batch size: 57, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:47:32,230 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 529550 2023-11-28 13:47:35,461 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.205e+01 8.821e+01 9.442e+01 9.964e+01 1.327e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-28 13:47:36,657 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 500, loss[loss=0.06464, simple_loss=0.0899, pruned_loss=0.0116, audio_tagging_loss=0.008084, over 15549.00 frames. ], tot_loss[loss=0.06589, simple_loss=0.08915, pruned_loss=0.01204, audio_tagging_loss=0.009274, over 2802927.31 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:47:44,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=3530346.6666666665, ans=0.5 2023-11-28 13:48:00,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3530480.0, ans=0.0 2023-11-28 13:48:23,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3530613.3333333335, ans=0.125 2023-11-28 13:48:29,497 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 529600 2023-11-28 13:48:30,735 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3530613.3333333335, ans=0.125 2023-11-28 13:48:33,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3530680.0, ans=0.125 2023-11-28 13:48:34,718 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 550, loss[loss=0.05556, simple_loss=0.07033, pruned_loss=0.009883, audio_tagging_loss=0.01052, over 17084.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.08896, pruned_loss=0.01205, audio_tagging_loss=0.009174, over 2856130.78 frames. ], batch size: 65, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:48:38,838 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3530680.0, ans=0.125 2023-11-28 13:48:52,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3530746.6666666665, ans=0.125 2023-11-28 13:48:59,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3530813.3333333335, ans=0.0 2023-11-28 13:49:00,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3530813.3333333335, ans=10.0 2023-11-28 13:49:02,516 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.59 vs. limit=15.0 2023-11-28 13:49:14,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3530880.0, ans=0.125 2023-11-28 13:49:18,892 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.01 vs. limit=10.0 2023-11-28 13:49:19,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3530880.0, ans=0.0 2023-11-28 13:49:22,613 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.76 vs. limit=22.5 2023-11-28 13:49:24,467 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 13:49:28,791 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 529650 2023-11-28 13:49:31,977 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.823e+01 8.881e+01 9.298e+01 9.934e+01 2.506e+02, threshold=1.860e+02, percent-clipped=1.0 2023-11-28 13:49:33,538 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 600, loss[loss=0.0511, simple_loss=0.06382, pruned_loss=0.006647, audio_tagging_loss=0.01254, over 15144.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.08981, pruned_loss=0.01209, audio_tagging_loss=0.009144, over 2903943.35 frames. ], batch size: 57, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:50:02,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3531146.6666666665, ans=0.125 2023-11-28 13:50:14,361 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3531213.3333333335, ans=0.125 2023-11-28 13:50:20,290 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3531280.0, ans=0.2 2023-11-28 13:50:20,467 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.13 vs. limit=22.5 2023-11-28 13:50:25,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3531280.0, ans=0.0 2023-11-28 13:50:27,218 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 529700 2023-11-28 13:50:28,721 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.76 vs. limit=10.0 2023-11-28 13:50:30,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3531346.6666666665, ans=0.125 2023-11-28 13:50:31,545 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 650, loss[loss=0.04633, simple_loss=0.06134, pruned_loss=0.007785, audio_tagging_loss=0.007877, over 14571.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.09066, pruned_loss=0.01229, audio_tagging_loss=0.008972, over 2932466.60 frames. ], batch size: 55, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:50:32,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3531346.6666666665, ans=0.125 2023-11-28 13:50:39,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3531346.6666666665, ans=0.0 2023-11-28 13:50:41,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3531413.3333333335, ans=0.0 2023-11-28 13:51:00,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3531480.0, ans=0.125 2023-11-28 13:51:10,599 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.03 vs. limit=6.0 2023-11-28 13:51:22,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3531613.3333333335, ans=0.1 2023-11-28 13:51:24,494 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 529750 2023-11-28 13:51:27,663 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.654e+01 9.012e+01 9.762e+01 1.029e+02 1.844e+02, threshold=1.952e+02, percent-clipped=0.0 2023-11-28 13:51:28,836 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 700, loss[loss=0.05656, simple_loss=0.0808, pruned_loss=0.009599, audio_tagging_loss=0.006563, over 15060.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.08997, pruned_loss=0.01208, audio_tagging_loss=0.009005, over 2962383.29 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:51:29,055 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 13:51:30,088 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3531680.0, ans=0.0 2023-11-28 13:51:35,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3531680.0, ans=0.025 2023-11-28 13:51:39,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3531680.0, ans=0.125 2023-11-28 13:51:54,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3531813.3333333335, ans=0.0 2023-11-28 13:52:15,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3531946.6666666665, ans=0.0 2023-11-28 13:52:18,826 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.57 vs. limit=12.0 2023-11-28 13:52:20,164 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.32 vs. limit=15.0 2023-11-28 13:52:22,782 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 529800 2023-11-28 13:52:28,101 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 750, loss[loss=0.05711, simple_loss=0.08655, pruned_loss=0.005561, audio_tagging_loss=0.008275, over 15959.00 frames. ], tot_loss[loss=0.0663, simple_loss=0.09016, pruned_loss=0.01219, audio_tagging_loss=0.009028, over 2991384.26 frames. ], batch size: 58, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:52:45,144 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.71 vs. limit=15.0 2023-11-28 13:52:52,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3532146.6666666665, ans=0.0 2023-11-28 13:53:05,246 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3532213.3333333335, ans=0.125 2023-11-28 13:53:08,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3532213.3333333335, ans=0.125 2023-11-28 13:53:11,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3532213.3333333335, ans=0.125 2023-11-28 13:53:13,218 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=15.31 vs. limit=15.0 2023-11-28 13:53:22,433 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 529850 2023-11-28 13:53:23,787 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3532280.0, ans=0.2 2023-11-28 13:53:25,530 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.08 vs. limit=6.0 2023-11-28 13:53:25,747 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.782e+01 9.191e+01 9.653e+01 1.030e+02 1.250e+02, threshold=1.931e+02, percent-clipped=0.0 2023-11-28 13:53:26,933 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 800, loss[loss=0.06777, simple_loss=0.1009, pruned_loss=0.009664, audio_tagging_loss=0.007651, over 15149.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.09002, pruned_loss=0.01225, audio_tagging_loss=0.009113, over 3010056.07 frames. ], batch size: 55, lr: 1.51e-03, grad_scale: 32.0 2023-11-28 13:53:32,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3532346.6666666665, ans=0.0 2023-11-28 13:53:40,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3532413.3333333335, ans=0.2 2023-11-28 13:53:50,905 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3532480.0, ans=0.1 2023-11-28 13:54:05,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3532546.6666666665, ans=0.0 2023-11-28 13:54:15,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3532613.3333333335, ans=0.125 2023-11-28 13:54:19,429 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=17.03 vs. limit=15.0 2023-11-28 13:54:20,040 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 529900 2023-11-28 13:54:20,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3532613.3333333335, ans=0.125 2023-11-28 13:54:21,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3532613.3333333335, ans=0.125 2023-11-28 13:54:24,379 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 850, loss[loss=0.0815, simple_loss=0.1095, pruned_loss=0.01644, audio_tagging_loss=0.01032, over 15811.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.09026, pruned_loss=0.01227, audio_tagging_loss=0.009209, over 3015709.71 frames. ], batch size: 61, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:54:25,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3532680.0, ans=0.0 2023-11-28 13:54:33,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3532680.0, ans=0.1 2023-11-28 13:54:36,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3532746.6666666665, ans=0.1 2023-11-28 13:54:38,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3532746.6666666665, ans=0.125 2023-11-28 13:54:49,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3532813.3333333335, ans=0.125 2023-11-28 13:54:51,514 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.79 vs. limit=15.0 2023-11-28 13:55:17,978 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 529950 2023-11-28 13:55:18,433 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.81 vs. limit=15.0 2023-11-28 13:55:22,293 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.009e+01 8.821e+01 9.434e+01 1.007e+02 1.194e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-28 13:55:22,320 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 900, loss[loss=0.03994, simple_loss=0.05429, pruned_loss=0.004379, audio_tagging_loss=0.00841, over 14380.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.09006, pruned_loss=0.01199, audio_tagging_loss=0.009225, over 3023343.73 frames. ], batch size: 55, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:55:40,119 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.69 vs. limit=22.5 2023-11-28 13:55:55,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3533146.6666666665, ans=0.125 2023-11-28 13:56:16,934 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 530000 2023-11-28 13:56:21,524 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 950, loss[loss=0.07632, simple_loss=0.1041, pruned_loss=0.01373, audio_tagging_loss=0.01053, over 14323.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.09015, pruned_loss=0.01207, audio_tagging_loss=0.009027, over 3029021.46 frames. ], batch size: 53, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:56:37,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3533413.3333333335, ans=0.07 2023-11-28 13:56:39,597 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.95 vs. limit=15.0 2023-11-28 13:56:44,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3533480.0, ans=0.125 2023-11-28 13:56:57,910 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.44 vs. limit=15.0 2023-11-28 13:57:01,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3533546.6666666665, ans=0.0 2023-11-28 13:57:14,564 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 530050 2023-11-28 13:57:18,901 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.694e+01 8.954e+01 9.513e+01 1.020e+02 1.278e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-28 13:57:18,940 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 1000, loss[loss=0.05248, simple_loss=0.06968, pruned_loss=0.008987, audio_tagging_loss=0.008652, over 14616.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.09038, pruned_loss=0.01225, audio_tagging_loss=0.008807, over 3030704.19 frames. ], batch size: 58, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:57:37,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3533746.6666666665, ans=0.0 2023-11-28 13:57:40,201 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.26 vs. limit=15.0 2023-11-28 13:57:42,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3533813.3333333335, ans=0.125 2023-11-28 13:57:45,908 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 13:58:11,627 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 530100 2023-11-28 13:58:15,919 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 1050, loss[loss=0.07804, simple_loss=0.1196, pruned_loss=0.01167, audio_tagging_loss=0.006571, over 16188.00 frames. ], tot_loss[loss=0.06596, simple_loss=0.09029, pruned_loss=0.01223, audio_tagging_loss=0.008583, over 3027490.25 frames. ], batch size: 59, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:58:28,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3534080.0, ans=0.0 2023-11-28 13:58:38,040 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.55 vs. limit=15.0 2023-11-28 13:58:41,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3534146.6666666665, ans=0.125 2023-11-28 13:59:00,645 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.60 vs. limit=12.0 2023-11-28 13:59:01,690 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.11 vs. limit=15.0 2023-11-28 13:59:03,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3534280.0, ans=0.0 2023-11-28 13:59:09,414 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 530150 2023-11-28 13:59:11,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3534280.0, ans=0.125 2023-11-28 13:59:13,513 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3534346.6666666665, ans=0.125 2023-11-28 13:59:14,281 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.353e+01 8.910e+01 9.787e+01 1.025e+02 1.500e+02, threshold=1.957e+02, percent-clipped=0.0 2023-11-28 13:59:14,307 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 1100, loss[loss=0.05833, simple_loss=0.07751, pruned_loss=0.007415, audio_tagging_loss=0.01216, over 14381.00 frames. ], tot_loss[loss=0.06505, simple_loss=0.08897, pruned_loss=0.01196, audio_tagging_loss=0.008607, over 3027223.39 frames. ], batch size: 54, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:59:18,529 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3534346.6666666665, ans=0.125 2023-11-28 13:59:19,371 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 13:59:22,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3534346.6666666665, ans=0.0 2023-11-28 13:59:23,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3534346.6666666665, ans=0.1 2023-11-28 13:59:27,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3534413.3333333335, ans=0.125 2023-11-28 13:59:30,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3534413.3333333335, ans=0.0 2023-11-28 13:59:47,063 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.99 vs. limit=6.0 2023-11-28 13:59:49,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3534546.6666666665, ans=0.125 2023-11-28 13:59:51,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3534546.6666666665, ans=0.125 2023-11-28 13:59:52,806 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3534546.6666666665, ans=0.125 2023-11-28 13:59:56,559 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3534546.6666666665, ans=0.1 2023-11-28 14:00:01,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3534613.3333333335, ans=0.1 2023-11-28 14:00:08,095 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 530200 2023-11-28 14:00:08,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3534613.3333333335, ans=0.125 2023-11-28 14:00:12,837 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 1150, loss[loss=0.05716, simple_loss=0.08246, pruned_loss=0.008694, audio_tagging_loss=0.007234, over 15798.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08924, pruned_loss=0.01204, audio_tagging_loss=0.008638, over 3032941.03 frames. ], batch size: 60, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:00:31,348 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3534746.6666666665, ans=0.125 2023-11-28 14:00:46,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3534880.0, ans=0.2 2023-11-28 14:00:50,876 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.11 vs. limit=15.0 2023-11-28 14:00:53,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3534880.0, ans=0.125 2023-11-28 14:00:58,618 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.62 vs. limit=6.0 2023-11-28 14:01:06,350 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 530250 2023-11-28 14:01:10,416 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=14.00 vs. limit=15.0 2023-11-28 14:01:10,690 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.181e+01 9.000e+01 9.554e+01 1.019e+02 1.286e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-28 14:01:10,729 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 1200, loss[loss=0.0463, simple_loss=0.0635, pruned_loss=0.005545, audio_tagging_loss=0.009004, over 13925.00 frames. ], tot_loss[loss=0.06473, simple_loss=0.08822, pruned_loss=0.01202, audio_tagging_loss=0.008601, over 3032976.96 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 32.0 2023-11-28 14:01:14,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3535013.3333333335, ans=0.125 2023-11-28 14:01:20,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3535013.3333333335, ans=0.125 2023-11-28 14:01:21,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3535080.0, ans=0.125 2023-11-28 14:01:23,518 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3535080.0, ans=0.125 2023-11-28 14:01:31,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3535080.0, ans=0.125 2023-11-28 14:01:41,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3535146.6666666665, ans=0.125 2023-11-28 14:01:54,561 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.87 vs. limit=15.0 2023-11-28 14:02:04,430 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 530300 2023-11-28 14:02:09,393 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 1250, loss[loss=0.06883, simple_loss=0.08442, pruned_loss=0.01496, audio_tagging_loss=0.01167, over 15276.00 frames. ], tot_loss[loss=0.06458, simple_loss=0.08785, pruned_loss=0.01205, audio_tagging_loss=0.008606, over 3032178.60 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:02:14,468 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.45 vs. limit=15.0 2023-11-28 14:02:27,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3535413.3333333335, ans=0.0 2023-11-28 14:02:28,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3535413.3333333335, ans=0.125 2023-11-28 14:02:32,888 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3535480.0, ans=0.0 2023-11-28 14:02:52,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3535546.6666666665, ans=0.125 2023-11-28 14:02:54,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3535613.3333333335, ans=0.1 2023-11-28 14:02:55,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3535613.3333333335, ans=0.1 2023-11-28 14:02:56,106 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.67 vs. limit=15.0 2023-11-28 14:03:01,749 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.51 vs. limit=22.5 2023-11-28 14:03:02,154 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 530350 2023-11-28 14:03:03,754 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.74 vs. limit=22.5 2023-11-28 14:03:05,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3535613.3333333335, ans=0.0 2023-11-28 14:03:07,353 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 1300, loss[loss=0.08165, simple_loss=0.1123, pruned_loss=0.01625, audio_tagging_loss=0.009253, over 15603.00 frames. ], tot_loss[loss=0.06456, simple_loss=0.08804, pruned_loss=0.01195, audio_tagging_loss=0.008589, over 3038595.16 frames. ], batch size: 57, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:03:08,412 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.341e+01 8.419e+01 9.205e+01 1.020e+02 1.250e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-28 14:03:32,073 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3535813.3333333335, ans=0.0 2023-11-28 14:04:01,220 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 530400 2023-11-28 14:04:05,971 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 1350, loss[loss=0.07381, simple_loss=0.1013, pruned_loss=0.01429, audio_tagging_loss=0.008868, over 14982.00 frames. ], tot_loss[loss=0.06425, simple_loss=0.08764, pruned_loss=0.01181, audio_tagging_loss=0.008619, over 3042415.05 frames. ], batch size: 58, lr: 1.51e-03, grad_scale: 8.0 2023-11-28 14:04:12,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3536013.3333333335, ans=0.125 2023-11-28 14:04:22,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3536080.0, ans=0.2 2023-11-28 14:04:28,286 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3536146.6666666665, ans=0.04949747468305833 2023-11-28 14:04:30,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3536146.6666666665, ans=0.5 2023-11-28 14:04:38,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3536146.6666666665, ans=0.0 2023-11-28 14:04:50,254 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 14:04:59,514 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 530450 2023-11-28 14:05:03,925 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 1400, loss[loss=0.07678, simple_loss=0.1069, pruned_loss=0.01685, audio_tagging_loss=0.006467, over 15966.00 frames. ], tot_loss[loss=0.06422, simple_loss=0.08751, pruned_loss=0.01183, audio_tagging_loss=0.008636, over 3043006.28 frames. ], batch size: 57, lr: 1.51e-03, grad_scale: 8.0 2023-11-28 14:05:06,573 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.021e+01 8.943e+01 9.366e+01 9.966e+01 1.345e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-28 14:05:06,912 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3536346.6666666665, ans=0.125 2023-11-28 14:05:08,952 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3536346.6666666665, ans=0.125 2023-11-28 14:05:44,610 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.33 vs. limit=12.0 2023-11-28 14:05:57,467 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 530500 2023-11-28 14:05:58,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3536613.3333333335, ans=0.0 2023-11-28 14:05:59,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3536613.3333333335, ans=0.125 2023-11-28 14:06:01,797 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 1450, loss[loss=0.05336, simple_loss=0.06665, pruned_loss=0.01019, audio_tagging_loss=0.009844, over 15727.00 frames. ], tot_loss[loss=0.06447, simple_loss=0.08808, pruned_loss=0.0118, audio_tagging_loss=0.008631, over 3047154.64 frames. ], batch size: 59, lr: 1.51e-03, grad_scale: 8.0 2023-11-28 14:06:04,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3536680.0, ans=0.125 2023-11-28 14:06:14,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3536746.6666666665, ans=0.125 2023-11-28 14:06:26,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3536813.3333333335, ans=0.1 2023-11-28 14:06:31,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3536813.3333333335, ans=0.125 2023-11-28 14:06:55,252 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 530550 2023-11-28 14:07:00,235 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 1500, loss[loss=0.0604, simple_loss=0.08502, pruned_loss=0.01065, audio_tagging_loss=0.007239, over 15550.00 frames. ], tot_loss[loss=0.06468, simple_loss=0.0877, pruned_loss=0.012, audio_tagging_loss=0.008825, over 3045221.76 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 8.0 2023-11-28 14:07:02,457 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.532e+01 9.037e+01 9.664e+01 1.030e+02 1.385e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-28 14:07:04,364 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.06 vs. limit=10.0 2023-11-28 14:07:14,569 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.50 vs. limit=22.5 2023-11-28 14:07:23,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3537146.6666666665, ans=0.0 2023-11-28 14:07:28,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3537146.6666666665, ans=0.125 2023-11-28 14:07:29,246 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3537146.6666666665, ans=0.05 2023-11-28 14:07:35,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3537213.3333333335, ans=0.0 2023-11-28 14:07:53,584 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 530600 2023-11-28 14:07:54,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3537280.0, ans=0.1 2023-11-28 14:07:58,710 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 1550, loss[loss=0.07444, simple_loss=0.1072, pruned_loss=0.01389, audio_tagging_loss=0.00693, over 14845.00 frames. ], tot_loss[loss=0.06505, simple_loss=0.088, pruned_loss=0.01214, audio_tagging_loss=0.008911, over 3042793.56 frames. ], batch size: 55, lr: 1.51e-03, grad_scale: 8.0 2023-11-28 14:08:01,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3537346.6666666665, ans=0.1 2023-11-28 14:08:19,617 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.42 vs. limit=15.0 2023-11-28 14:08:27,911 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.52 vs. limit=15.0 2023-11-28 14:08:46,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3537613.3333333335, ans=0.035 2023-11-28 14:08:48,411 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3537613.3333333335, ans=0.04949747468305833 2023-11-28 14:08:51,555 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 530650 2023-11-28 14:08:55,985 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 1600, loss[loss=0.04679, simple_loss=0.05827, pruned_loss=0.007078, audio_tagging_loss=0.01057, over 14284.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.0891, pruned_loss=0.01221, audio_tagging_loss=0.008909, over 3044337.30 frames. ], batch size: 54, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:08:57,259 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3537680.0, ans=0.125 2023-11-28 14:08:58,168 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.016e+01 9.104e+01 9.583e+01 1.052e+02 1.503e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-28 14:09:03,914 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3537680.0, ans=0.125 2023-11-28 14:09:04,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3537680.0, ans=0.2 2023-11-28 14:09:10,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=3537746.6666666665, ans=10.0 2023-11-28 14:09:26,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3537813.3333333335, ans=0.2 2023-11-28 14:09:48,739 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 530700 2023-11-28 14:09:53,813 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 1650, loss[loss=0.07792, simple_loss=0.105, pruned_loss=0.01773, audio_tagging_loss=0.007702, over 14409.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.08802, pruned_loss=0.01212, audio_tagging_loss=0.009027, over 3034345.76 frames. ], batch size: 54, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:10:19,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3538146.6666666665, ans=0.125 2023-11-28 14:10:21,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3538146.6666666665, ans=0.125 2023-11-28 14:10:40,749 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3538280.0, ans=0.125 2023-11-28 14:10:47,189 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 530750 2023-11-28 14:10:47,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3538280.0, ans=0.0 2023-11-28 14:10:47,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3538280.0, ans=10.0 2023-11-28 14:10:51,977 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 1700, loss[loss=0.06253, simple_loss=0.09019, pruned_loss=0.009918, audio_tagging_loss=0.007515, over 16046.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.08868, pruned_loss=0.01219, audio_tagging_loss=0.009034, over 3039911.26 frames. ], batch size: 60, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:10:54,244 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.465e+01 8.956e+01 9.565e+01 1.008e+02 1.733e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-28 14:11:13,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3538480.0, ans=0.125 2023-11-28 14:11:26,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3538546.6666666665, ans=0.1 2023-11-28 14:11:32,513 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 14:11:33,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3538546.6666666665, ans=0.2 2023-11-28 14:11:45,481 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 530800 2023-11-28 14:11:45,579 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 14:11:50,091 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 1750, loss[loss=0.05496, simple_loss=0.07843, pruned_loss=0.008721, audio_tagging_loss=0.007029, over 14942.00 frames. ], tot_loss[loss=0.06507, simple_loss=0.08811, pruned_loss=0.01205, audio_tagging_loss=0.008971, over 3037981.56 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:12:16,998 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3538813.3333333335, ans=0.0 2023-11-28 14:12:27,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3538880.0, ans=0.0 2023-11-28 14:12:29,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3538880.0, ans=0.1 2023-11-28 14:12:40,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3538946.6666666665, ans=0.125 2023-11-28 14:12:43,157 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 530850 2023-11-28 14:12:47,405 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 1800, loss[loss=0.0777, simple_loss=0.101, pruned_loss=0.01923, audio_tagging_loss=0.007954, over 15735.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08857, pruned_loss=0.01206, audio_tagging_loss=0.008834, over 3037658.56 frames. ], batch size: 59, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:12:50,237 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.232e+01 8.905e+01 9.378e+01 9.880e+01 1.265e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-28 14:13:02,379 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3539080.0, ans=0.05 2023-11-28 14:13:26,271 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3539213.3333333335, ans=0.125 2023-11-28 14:13:26,689 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.50 vs. limit=10.0 2023-11-28 14:13:29,840 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.47 vs. limit=22.5 2023-11-28 14:13:41,512 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 530900 2023-11-28 14:13:44,594 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3539280.0, ans=0.125 2023-11-28 14:13:46,525 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 1850, loss[loss=0.05617, simple_loss=0.07439, pruned_loss=0.009252, audio_tagging_loss=0.009724, over 16013.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.0896, pruned_loss=0.01216, audio_tagging_loss=0.008632, over 3036680.01 frames. ], batch size: 60, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:13:50,882 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.33 vs. limit=15.0 2023-11-28 14:14:03,850 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.46 vs. limit=15.0 2023-11-28 14:14:07,853 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3539413.3333333335, ans=0.125 2023-11-28 14:14:25,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3539546.6666666665, ans=0.125 2023-11-28 14:14:34,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.whiten.whitening_limit, batch_count=3539613.3333333335, ans=12.0 2023-11-28 14:14:40,760 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 530950 2023-11-28 14:14:45,118 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 1900, loss[loss=0.06195, simple_loss=0.08997, pruned_loss=0.009046, audio_tagging_loss=0.007918, over 15490.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.08887, pruned_loss=0.01208, audio_tagging_loss=0.008567, over 3034195.59 frames. ], batch size: 59, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:14:46,538 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3539680.0, ans=0.2 2023-11-28 14:14:47,352 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.627e+01 8.623e+01 9.343e+01 1.003e+02 1.342e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-28 14:14:52,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3539680.0, ans=0.0 2023-11-28 14:15:19,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3539880.0, ans=0.0 2023-11-28 14:15:25,510 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.39 vs. limit=15.0 2023-11-28 14:15:36,246 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3539946.6666666665, ans=0.125 2023-11-28 14:15:38,301 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 531000 2023-11-28 14:15:43,047 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 1950, loss[loss=0.0642, simple_loss=0.08639, pruned_loss=0.0141, audio_tagging_loss=0.006903, over 15847.00 frames. ], tot_loss[loss=0.06444, simple_loss=0.08774, pruned_loss=0.01193, audio_tagging_loss=0.00864, over 3028066.05 frames. ], batch size: 60, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:16:10,575 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.18 vs. limit=22.5 2023-11-28 14:16:30,919 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.22 vs. limit=22.5 2023-11-28 14:16:31,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3540280.0, ans=0.1 2023-11-28 14:16:35,987 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 531050 2023-11-28 14:16:40,333 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 2000, loss[loss=0.08165, simple_loss=0.1131, pruned_loss=0.0176, audio_tagging_loss=0.007479, over 15523.00 frames. ], tot_loss[loss=0.0644, simple_loss=0.0877, pruned_loss=0.0119, audio_tagging_loss=0.008643, over 3032704.57 frames. ], batch size: 58, lr: 1.51e-03, grad_scale: 32.0 2023-11-28 14:16:42,527 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.094e+01 8.874e+01 9.480e+01 1.027e+02 1.449e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-28 14:17:09,703 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3540480.0, ans=0.125 2023-11-28 14:17:14,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3540546.6666666665, ans=0.125 2023-11-28 14:17:20,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3540546.6666666665, ans=0.0 2023-11-28 14:17:31,886 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.38 vs. limit=12.0 2023-11-28 14:17:34,610 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 531100 2023-11-28 14:17:39,036 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 2050, loss[loss=0.05461, simple_loss=0.06982, pruned_loss=0.0115, audio_tagging_loss=0.008192, over 15094.00 frames. ], tot_loss[loss=0.06421, simple_loss=0.08771, pruned_loss=0.01174, audio_tagging_loss=0.008619, over 3028943.00 frames. ], batch size: 59, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:17:48,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3540680.0, ans=0.0 2023-11-28 14:18:02,924 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3540813.3333333335, ans=0.125 2023-11-28 14:18:04,097 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3540813.3333333335, ans=0.125 2023-11-28 14:18:23,045 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3540880.0, ans=0.125 2023-11-28 14:18:25,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=3540946.6666666665, ans=15.0 2023-11-28 14:18:31,778 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 531150 2023-11-28 14:18:36,098 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 2100, loss[loss=0.06303, simple_loss=0.087, pruned_loss=0.01138, audio_tagging_loss=0.008152, over 16088.00 frames. ], tot_loss[loss=0.06456, simple_loss=0.08838, pruned_loss=0.01189, audio_tagging_loss=0.008484, over 3033375.48 frames. ], batch size: 59, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:18:39,395 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.425e+01 8.857e+01 9.324e+01 1.026e+02 1.303e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-28 14:18:59,726 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.29 vs. limit=15.0 2023-11-28 14:19:06,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3541146.6666666665, ans=0.125 2023-11-28 14:19:07,271 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3541146.6666666665, ans=0.1 2023-11-28 14:19:17,467 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.58 vs. limit=15.0 2023-11-28 14:19:21,726 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3541280.0, ans=0.1 2023-11-28 14:19:27,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3541280.0, ans=0.125 2023-11-28 14:19:29,745 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 531200 2023-11-28 14:19:31,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3541280.0, ans=0.125 2023-11-28 14:19:34,445 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 2150, loss[loss=0.06169, simple_loss=0.08171, pruned_loss=0.01416, audio_tagging_loss=0.006673, over 14977.00 frames. ], tot_loss[loss=0.06453, simple_loss=0.08811, pruned_loss=0.01193, audio_tagging_loss=0.008548, over 3035022.11 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:20:01,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3541480.0, ans=0.1 2023-11-28 14:20:01,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3541480.0, ans=0.2 2023-11-28 14:20:06,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3541480.0, ans=0.0 2023-11-28 14:20:12,122 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 14:20:17,912 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3541546.6666666665, ans=0.125 2023-11-28 14:20:22,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3541613.3333333335, ans=0.125 2023-11-28 14:20:28,022 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 531250 2023-11-28 14:20:29,644 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.88 vs. limit=15.0 2023-11-28 14:20:32,776 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 2200, loss[loss=0.06843, simple_loss=0.07697, pruned_loss=0.01451, audio_tagging_loss=0.01544, over 14750.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.0899, pruned_loss=0.01219, audio_tagging_loss=0.008525, over 3043295.49 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:20:34,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3541680.0, ans=0.0 2023-11-28 14:20:36,791 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.579e+01 8.900e+01 9.511e+01 1.009e+02 1.221e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-28 14:20:39,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3541680.0, ans=0.2 2023-11-28 14:20:39,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3541680.0, ans=0.07 2023-11-28 14:21:19,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3541946.6666666665, ans=0.125 2023-11-28 14:21:24,635 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3541946.6666666665, ans=0.0 2023-11-28 14:21:26,739 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 531300 2023-11-28 14:21:30,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3542013.3333333335, ans=0.1 2023-11-28 14:21:31,077 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 2250, loss[loss=0.07231, simple_loss=0.09661, pruned_loss=0.01308, audio_tagging_loss=0.01093, over 15469.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08979, pruned_loss=0.0122, audio_tagging_loss=0.008537, over 3040316.22 frames. ], batch size: 59, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:21:32,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3542013.3333333335, ans=0.125 2023-11-28 14:21:33,458 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3542013.3333333335, ans=0.1 2023-11-28 14:21:40,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3542013.3333333335, ans=0.0 2023-11-28 14:21:46,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3542080.0, ans=0.125 2023-11-28 14:21:49,081 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.97 vs. limit=22.5 2023-11-28 14:21:54,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3542146.6666666665, ans=0.125 2023-11-28 14:21:58,048 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.68 vs. limit=15.0 2023-11-28 14:22:17,754 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.24 vs. limit=15.0 2023-11-28 14:22:24,241 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 531350 2023-11-28 14:22:24,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3542280.0, ans=0.1 2023-11-28 14:22:28,600 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 2300, loss[loss=0.06212, simple_loss=0.07943, pruned_loss=0.01419, audio_tagging_loss=0.008222, over 15186.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.08987, pruned_loss=0.01228, audio_tagging_loss=0.008612, over 3037549.83 frames. ], batch size: 58, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:22:31,387 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=3542346.6666666665, ans=10.0 2023-11-28 14:22:31,831 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.858e+01 9.166e+01 9.728e+01 1.033e+02 1.405e+02, threshold=1.946e+02, percent-clipped=0.0 2023-11-28 14:22:53,554 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3542480.0, ans=0.125 2023-11-28 14:22:57,875 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 14:22:59,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3542480.0, ans=0.2 2023-11-28 14:23:01,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3542480.0, ans=0.125 2023-11-28 14:23:21,497 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 14:23:21,563 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 531400 2023-11-28 14:23:24,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3542613.3333333335, ans=0.0 2023-11-28 14:23:26,672 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 2350, loss[loss=0.0579, simple_loss=0.07952, pruned_loss=0.01067, audio_tagging_loss=0.007472, over 16149.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.0894, pruned_loss=0.01223, audio_tagging_loss=0.008662, over 3040278.71 frames. ], batch size: 61, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:23:38,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3542746.6666666665, ans=0.2 2023-11-28 14:23:50,559 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3542813.3333333335, ans=0.125 2023-11-28 14:23:55,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3542813.3333333335, ans=0.1 2023-11-28 14:23:57,666 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3542813.3333333335, ans=0.125 2023-11-28 14:23:58,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3542813.3333333335, ans=0.125 2023-11-28 14:24:05,590 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.37 vs. limit=15.0 2023-11-28 14:24:14,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3542946.6666666665, ans=0.0 2023-11-28 14:24:20,483 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 531450 2023-11-28 14:24:21,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3542946.6666666665, ans=0.125 2023-11-28 14:24:25,490 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 2400, loss[loss=0.08905, simple_loss=0.1228, pruned_loss=0.02003, audio_tagging_loss=0.007624, over 14807.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.09046, pruned_loss=0.01235, audio_tagging_loss=0.008686, over 3039054.70 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 32.0 2023-11-28 14:24:28,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3543013.3333333335, ans=0.09899494936611666 2023-11-28 14:24:28,803 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.833e+01 8.802e+01 9.417e+01 9.979e+01 1.299e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-28 14:24:30,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3543013.3333333335, ans=0.2 2023-11-28 14:24:37,622 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-28 14:24:38,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3543080.0, ans=0.2 2023-11-28 14:25:03,916 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3543213.3333333335, ans=0.2 2023-11-28 14:25:18,428 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 531500 2023-11-28 14:25:23,328 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 2450, loss[loss=0.052, simple_loss=0.06557, pruned_loss=0.009971, audio_tagging_loss=0.009244, over 15589.00 frames. ], tot_loss[loss=0.06648, simple_loss=0.09052, pruned_loss=0.01246, audio_tagging_loss=0.008753, over 3035683.48 frames. ], batch size: 60, lr: 1.51e-03, grad_scale: 32.0 2023-11-28 14:25:25,907 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3543346.6666666665, ans=0.1 2023-11-28 14:25:28,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3543346.6666666665, ans=0.125 2023-11-28 14:25:41,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3543413.3333333335, ans=0.125 2023-11-28 14:25:54,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3543480.0, ans=0.125 2023-11-28 14:26:03,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3543546.6666666665, ans=0.5 2023-11-28 14:26:12,694 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.20 vs. limit=6.0 2023-11-28 14:26:16,405 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 531550 2023-11-28 14:26:21,242 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 2500, loss[loss=0.06423, simple_loss=0.08716, pruned_loss=0.01305, audio_tagging_loss=0.007594, over 15324.00 frames. ], tot_loss[loss=0.06696, simple_loss=0.09121, pruned_loss=0.01261, audio_tagging_loss=0.008742, over 3041580.99 frames. ], batch size: 55, lr: 1.51e-03, grad_scale: 32.0 2023-11-28 14:26:25,021 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.666e+01 9.030e+01 9.693e+01 1.035e+02 1.388e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-28 14:26:29,592 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3543680.0, ans=0.125 2023-11-28 14:26:40,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3543746.6666666665, ans=0.0 2023-11-28 14:26:42,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3543746.6666666665, ans=0.125 2023-11-28 14:26:52,346 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.59 vs. limit=22.5 2023-11-28 14:26:56,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3543880.0, ans=0.1 2023-11-28 14:26:56,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3543880.0, ans=0.0 2023-11-28 14:27:02,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3543880.0, ans=0.125 2023-11-28 14:27:14,782 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 531600 2023-11-28 14:27:17,595 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3543946.6666666665, ans=0.125 2023-11-28 14:27:19,439 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 2550, loss[loss=0.07212, simple_loss=0.08553, pruned_loss=0.01676, audio_tagging_loss=0.01259, over 15110.00 frames. ], tot_loss[loss=0.06673, simple_loss=0.09096, pruned_loss=0.01257, audio_tagging_loss=0.008677, over 3040711.03 frames. ], batch size: 60, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:27:20,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3544013.3333333335, ans=0.125 2023-11-28 14:27:26,969 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3544013.3333333335, ans=0.125 2023-11-28 14:27:29,434 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.95 vs. limit=15.0 2023-11-28 14:27:34,012 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3544080.0, ans=0.1 2023-11-28 14:27:35,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3544080.0, ans=0.1 2023-11-28 14:27:35,418 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3544080.0, ans=0.125 2023-11-28 14:27:38,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3544080.0, ans=0.0 2023-11-28 14:27:45,958 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.39 vs. limit=10.0 2023-11-28 14:27:52,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3544146.6666666665, ans=0.125 2023-11-28 14:28:06,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3544280.0, ans=0.0 2023-11-28 14:28:09,906 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3544280.0, ans=0.125 2023-11-28 14:28:13,592 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 531650 2023-11-28 14:28:15,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3544280.0, ans=0.1 2023-11-28 14:28:18,560 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 2600, loss[loss=0.06254, simple_loss=0.08974, pruned_loss=0.01185, audio_tagging_loss=0.005821, over 15337.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.08981, pruned_loss=0.01237, audio_tagging_loss=0.008658, over 3040988.53 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 8.0 2023-11-28 14:28:20,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3544346.6666666665, ans=0.5 2023-11-28 14:28:24,064 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.316e+01 8.904e+01 9.542e+01 1.021e+02 1.415e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-28 14:28:49,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3544480.0, ans=0.0 2023-11-28 14:28:49,938 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.35 vs. limit=15.0 2023-11-28 14:29:06,437 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.60 vs. limit=15.0 2023-11-28 14:29:11,517 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 531700 2023-11-28 14:29:11,663 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 14:29:16,008 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 2650, loss[loss=0.06788, simple_loss=0.09306, pruned_loss=0.01384, audio_tagging_loss=0.007507, over 14979.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.09089, pruned_loss=0.0126, audio_tagging_loss=0.008505, over 3040595.13 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 8.0 2023-11-28 14:29:19,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3544680.0, ans=0.125 2023-11-28 14:29:59,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3544880.0, ans=0.04949747468305833 2023-11-28 14:30:10,256 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 531750 2023-11-28 14:30:14,513 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 2700, loss[loss=0.07102, simple_loss=0.08556, pruned_loss=0.01673, audio_tagging_loss=0.01151, over 14149.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.09058, pruned_loss=0.01256, audio_tagging_loss=0.00852, over 3033973.85 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 8.0 2023-11-28 14:30:19,908 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.102e+01 8.994e+01 9.441e+01 1.024e+02 1.210e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-28 14:30:30,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3545080.0, ans=0.2 2023-11-28 14:30:49,649 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3545213.3333333335, ans=0.125 2023-11-28 14:30:55,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3545213.3333333335, ans=0.125 2023-11-28 14:30:56,720 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3545213.3333333335, ans=0.125 2023-11-28 14:31:07,521 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 531800 2023-11-28 14:31:08,701 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3545280.0, ans=0.035 2023-11-28 14:31:10,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=3545280.0, ans=15.0 2023-11-28 14:31:12,779 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 2750, loss[loss=0.06672, simple_loss=0.09636, pruned_loss=0.009552, audio_tagging_loss=0.008986, over 15790.00 frames. ], tot_loss[loss=0.06657, simple_loss=0.09079, pruned_loss=0.01262, audio_tagging_loss=0.008561, over 3037299.86 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 8.0 2023-11-28 14:31:13,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3545346.6666666665, ans=0.2 2023-11-28 14:31:28,479 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.07 vs. limit=15.0 2023-11-28 14:31:38,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3545480.0, ans=0.0 2023-11-28 14:31:39,930 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3545480.0, ans=0.1 2023-11-28 14:31:53,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3545546.6666666665, ans=0.0 2023-11-28 14:31:55,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3545546.6666666665, ans=0.0 2023-11-28 14:31:55,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3545546.6666666665, ans=0.125 2023-11-28 14:32:04,346 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3545613.3333333335, ans=0.0 2023-11-28 14:32:05,271 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 14:32:06,458 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 531850 2023-11-28 14:32:10,862 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 2800, loss[loss=0.07068, simple_loss=0.1079, pruned_loss=0.0111, audio_tagging_loss=0.005609, over 15537.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.09021, pruned_loss=0.01228, audio_tagging_loss=0.008516, over 3037249.18 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:32:16,757 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.200e+01 8.903e+01 9.379e+01 1.017e+02 3.083e+02, threshold=1.876e+02, percent-clipped=1.0 2023-11-28 14:32:25,248 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3545746.6666666665, ans=0.125 2023-11-28 14:32:26,422 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3545746.6666666665, ans=0.125 2023-11-28 14:32:54,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3545880.0, ans=0.0 2023-11-28 14:32:55,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3545946.6666666665, ans=10.0 2023-11-28 14:32:57,424 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.79 vs. limit=6.0 2023-11-28 14:33:04,511 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 531900 2023-11-28 14:33:05,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3545946.6666666665, ans=0.125 2023-11-28 14:33:08,792 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 2850, loss[loss=0.07766, simple_loss=0.1037, pruned_loss=0.01576, audio_tagging_loss=0.01005, over 14358.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.08895, pruned_loss=0.01212, audio_tagging_loss=0.008542, over 3032938.67 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:33:15,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3546013.3333333335, ans=0.1 2023-11-28 14:33:43,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3546213.3333333335, ans=0.1 2023-11-28 14:33:46,311 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.25 vs. limit=6.0 2023-11-28 14:33:52,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3546213.3333333335, ans=0.125 2023-11-28 14:34:01,846 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 531950 2023-11-28 14:34:06,186 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 2900, loss[loss=0.05249, simple_loss=0.07282, pruned_loss=0.006279, audio_tagging_loss=0.009803, over 14781.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08915, pruned_loss=0.01211, audio_tagging_loss=0.008636, over 3032239.20 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:34:06,523 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3546346.6666666665, ans=0.125 2023-11-28 14:34:10,473 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3546346.6666666665, ans=0.125 2023-11-28 14:34:12,363 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.591e+01 8.752e+01 9.369e+01 1.016e+02 1.365e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-28 14:34:19,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3546413.3333333335, ans=0.125 2023-11-28 14:34:27,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3546413.3333333335, ans=0.1 2023-11-28 14:34:30,671 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.95 vs. limit=6.0 2023-11-28 14:35:00,614 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 532000 2023-11-28 14:35:07,386 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 2950, loss[loss=0.05736, simple_loss=0.08244, pruned_loss=0.006875, audio_tagging_loss=0.009261, over 15382.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08994, pruned_loss=0.01219, audio_tagging_loss=0.008608, over 3034920.04 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:35:12,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3546680.0, ans=0.125 2023-11-28 14:35:13,303 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 14:35:23,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3546746.6666666665, ans=0.07 2023-11-28 14:35:23,275 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.87 vs. limit=12.0 2023-11-28 14:35:27,739 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.90 vs. limit=15.0 2023-11-28 14:35:44,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3546880.0, ans=0.125 2023-11-28 14:36:01,142 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 532050 2023-11-28 14:36:06,000 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 3000, loss[loss=0.06372, simple_loss=0.08808, pruned_loss=0.00983, audio_tagging_loss=0.009852, over 14943.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.0898, pruned_loss=0.01203, audio_tagging_loss=0.008613, over 3038535.55 frames. ], batch size: 54, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:36:06,001 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-28 14:36:41,297 INFO [train_asr.py:1267] (1/4) Epoch 45, validation: loss=0.05774, simple_loss=0.05054, pruned_loss=0.005299, audio_tagging_loss=0.02717, over 4681554.00 frames. 2023-11-28 14:36:41,298 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-28 14:36:46,873 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.215e+01 8.901e+01 9.475e+01 1.021e+02 1.271e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-28 14:36:58,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3547080.0, ans=0.0 2023-11-28 14:37:13,428 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.97 vs. limit=15.0 2023-11-28 14:37:17,282 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3547213.3333333335, ans=0.2 2023-11-28 14:37:34,396 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 532100 2023-11-28 14:37:39,441 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 3050, loss[loss=0.03974, simple_loss=0.04738, pruned_loss=0.006437, audio_tagging_loss=0.009609, over 16027.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.08962, pruned_loss=0.01195, audio_tagging_loss=0.008637, over 3041506.36 frames. ], batch size: 61, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:37:49,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3547346.6666666665, ans=0.07 2023-11-28 14:37:53,930 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3547413.3333333335, ans=0.125 2023-11-28 14:38:15,454 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 14:38:23,895 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3547546.6666666665, ans=0.0 2023-11-28 14:38:33,682 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 532150 2023-11-28 14:38:38,099 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 3100, loss[loss=0.05291, simple_loss=0.06746, pruned_loss=0.008522, audio_tagging_loss=0.01065, over 13534.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.08995, pruned_loss=0.01188, audio_tagging_loss=0.008714, over 3038377.36 frames. ], batch size: 53, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:38:43,569 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.323e+01 8.815e+01 9.478e+01 1.004e+02 1.302e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-28 14:38:56,054 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.71 vs. limit=15.0 2023-11-28 14:38:58,053 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.48 vs. limit=15.0 2023-11-28 14:38:58,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3547746.6666666665, ans=0.07 2023-11-28 14:38:59,901 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3547813.3333333335, ans=0.0 2023-11-28 14:39:31,188 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 532200 2023-11-28 14:39:35,925 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 3150, loss[loss=0.05489, simple_loss=0.07465, pruned_loss=0.007864, audio_tagging_loss=0.009703, over 15708.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08989, pruned_loss=0.012, audio_tagging_loss=0.008741, over 3046008.90 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:39:43,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3548013.3333333335, ans=0.125 2023-11-28 14:39:45,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3548013.3333333335, ans=0.0 2023-11-28 14:40:31,150 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 14:41:07,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3548280.0, ans=0.0 2023-11-28 14:41:16,697 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 532250 2023-11-28 14:41:23,861 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 3200, loss[loss=0.05919, simple_loss=0.07349, pruned_loss=0.01135, audio_tagging_loss=0.0111, over 14768.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.08962, pruned_loss=0.01203, audio_tagging_loss=0.008914, over 3045587.17 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 14:41:29,467 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3548346.6666666665, ans=0.0 2023-11-28 14:41:33,621 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.062e+01 8.970e+01 9.466e+01 1.009e+02 1.247e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-28 14:41:39,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3548346.6666666665, ans=0.1 2023-11-28 14:42:28,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3548546.6666666665, ans=0.1 2023-11-28 14:42:36,764 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.42 vs. limit=10.0 2023-11-28 14:42:50,655 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.22 vs. limit=12.0 2023-11-28 14:42:53,726 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 532300 2023-11-28 14:42:55,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3548613.3333333335, ans=0.125 2023-11-28 14:42:57,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3548613.3333333335, ans=0.125 2023-11-28 14:43:00,666 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 3250, loss[loss=0.06556, simple_loss=0.09398, pruned_loss=0.008778, audio_tagging_loss=0.009789, over 15176.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.08922, pruned_loss=0.01204, audio_tagging_loss=0.009008, over 3052135.90 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 14:43:16,083 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3548680.0, ans=0.1 2023-11-28 14:43:55,509 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3548880.0, ans=0.125 2023-11-28 14:44:02,120 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.00 vs. limit=15.0 2023-11-28 14:44:16,623 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 14:44:18,441 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3548946.6666666665, ans=0.1 2023-11-28 14:44:27,718 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 532350 2023-11-28 14:44:29,997 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 14:44:35,324 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 3300, loss[loss=0.04885, simple_loss=0.0658, pruned_loss=0.006347, audio_tagging_loss=0.009604, over 15265.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.09055, pruned_loss=0.01233, audio_tagging_loss=0.008982, over 3047676.91 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:44:49,674 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.697e+01 8.914e+01 9.371e+01 1.015e+02 1.378e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-28 14:44:56,354 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3549080.0, ans=0.1 2023-11-28 14:45:11,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3549146.6666666665, ans=0.0 2023-11-28 14:45:25,028 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3549146.6666666665, ans=0.1 2023-11-28 14:45:28,213 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.16 vs. limit=22.5 2023-11-28 14:45:39,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3549213.3333333335, ans=0.0 2023-11-28 14:45:55,654 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 532400 2023-11-28 14:45:56,396 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.53 vs. limit=22.5 2023-11-28 14:46:02,255 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 3350, loss[loss=0.07727, simple_loss=0.107, pruned_loss=0.01572, audio_tagging_loss=0.008076, over 16560.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.09083, pruned_loss=0.01232, audio_tagging_loss=0.008922, over 3051079.11 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:46:11,768 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.09 vs. limit=6.0 2023-11-28 14:46:28,857 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.59 vs. limit=15.0 2023-11-28 14:47:18,258 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 532450 2023-11-28 14:47:20,300 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.85 vs. limit=15.0 2023-11-28 14:47:25,139 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 3400, loss[loss=0.07032, simple_loss=0.09764, pruned_loss=0.01127, audio_tagging_loss=0.01023, over 14553.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.09103, pruned_loss=0.01233, audio_tagging_loss=0.008805, over 3052175.40 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:47:33,863 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.58 vs. limit=5.0 2023-11-28 14:47:35,347 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.293e+01 8.841e+01 9.503e+01 1.045e+02 1.895e+02, threshold=1.901e+02, percent-clipped=1.0 2023-11-28 14:47:58,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3549813.3333333335, ans=0.0 2023-11-28 14:48:23,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3549880.0, ans=0.025 2023-11-28 14:48:37,309 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 532500 2023-11-28 14:48:42,914 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 3450, loss[loss=0.05543, simple_loss=0.0769, pruned_loss=0.009034, audio_tagging_loss=0.007944, over 14684.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.09032, pruned_loss=0.01216, audio_tagging_loss=0.008665, over 3051203.90 frames. ], batch size: 54, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:48:46,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3550013.3333333335, ans=0.0 2023-11-28 14:49:08,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3550080.0, ans=0.1 2023-11-28 14:49:19,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3550146.6666666665, ans=0.125 2023-11-28 14:49:40,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3550213.3333333335, ans=0.125 2023-11-28 14:49:50,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3550280.0, ans=0.2 2023-11-28 14:49:53,874 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 532550 2023-11-28 14:49:54,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3550280.0, ans=0.125 2023-11-28 14:49:55,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff3.min_abs, batch_count=3550280.0, ans=0.2 2023-11-28 14:49:59,812 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 3500, loss[loss=0.06248, simple_loss=0.07981, pruned_loss=0.01375, audio_tagging_loss=0.008828, over 15226.00 frames. ], tot_loss[loss=0.06595, simple_loss=0.0904, pruned_loss=0.01221, audio_tagging_loss=0.008547, over 3050564.05 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:50:08,269 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.375e+01 8.938e+01 9.500e+01 1.028e+02 1.250e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-28 14:50:39,268 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 14:51:07,251 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 532600 2023-11-28 14:51:13,082 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 3550, loss[loss=0.05856, simple_loss=0.08199, pruned_loss=0.008918, audio_tagging_loss=0.008649, over 14914.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08964, pruned_loss=0.01221, audio_tagging_loss=0.008562, over 3044218.77 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:51:51,950 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=16.21 vs. limit=15.0 2023-11-28 14:51:55,403 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.56 vs. limit=22.5 2023-11-28 14:52:03,246 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3550880.0, ans=0.125 2023-11-28 14:52:10,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3550946.6666666665, ans=0.0 2023-11-28 14:52:18,698 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 532650 2023-11-28 14:52:20,998 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.47 vs. limit=10.0 2023-11-28 14:52:24,124 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.77 vs. limit=15.0 2023-11-28 14:52:24,491 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 3600, loss[loss=0.04629, simple_loss=0.0615, pruned_loss=0.007096, audio_tagging_loss=0.008438, over 15786.00 frames. ], tot_loss[loss=0.06501, simple_loss=0.08861, pruned_loss=0.01209, audio_tagging_loss=0.00862, over 3045895.28 frames. ], batch size: 61, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 14:52:32,720 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.393e+01 8.883e+01 9.624e+01 1.026e+02 1.265e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-28 14:52:43,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3551080.0, ans=0.2 2023-11-28 14:52:43,432 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.58 vs. limit=10.0 2023-11-28 14:52:45,583 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3551080.0, ans=0.0 2023-11-28 14:52:59,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3551146.6666666665, ans=0.0 2023-11-28 14:53:14,591 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.02 vs. limit=22.5 2023-11-28 14:53:15,559 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3551213.3333333335, ans=0.125 2023-11-28 14:53:29,923 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 532700 2023-11-28 14:53:34,854 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 3650, loss[loss=0.03402, simple_loss=0.03583, pruned_loss=0.003324, audio_tagging_loss=0.01278, over 15386.00 frames. ], tot_loss[loss=0.06463, simple_loss=0.08837, pruned_loss=0.01185, audio_tagging_loss=0.008591, over 3039677.91 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:53:39,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3551346.6666666665, ans=0.07 2023-11-28 14:54:05,124 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3551480.0, ans=0.125 2023-11-28 14:54:09,906 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3551480.0, ans=0.1 2023-11-28 14:54:39,974 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 532750 2023-11-28 14:54:43,997 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.28 vs. limit=15.0 2023-11-28 14:54:45,629 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 3700, loss[loss=0.09552, simple_loss=0.1401, pruned_loss=0.01971, audio_tagging_loss=0.005742, over 15939.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08987, pruned_loss=0.01205, audio_tagging_loss=0.00854, over 3047724.63 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:54:48,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3551680.0, ans=0.125 2023-11-28 14:54:55,120 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.500e+01 8.996e+01 9.675e+01 1.033e+02 1.277e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-28 14:55:07,088 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3551746.6666666665, ans=0.0 2023-11-28 14:55:13,228 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3551813.3333333335, ans=0.2 2023-11-28 14:55:47,831 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 532800 2023-11-28 14:55:53,253 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 3750, loss[loss=0.07381, simple_loss=0.1022, pruned_loss=0.01571, audio_tagging_loss=0.007001, over 15702.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.09025, pruned_loss=0.01222, audio_tagging_loss=0.00853, over 3054417.41 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:55:54,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3552013.3333333335, ans=0.0 2023-11-28 14:56:12,189 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3552080.0, ans=0.125 2023-11-28 14:56:40,613 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 14:56:51,596 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.22 vs. limit=15.0 2023-11-28 14:56:54,232 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 532850 2023-11-28 14:56:55,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3552280.0, ans=0.07 2023-11-28 14:56:58,856 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 3800, loss[loss=0.07381, simple_loss=0.1007, pruned_loss=0.01586, audio_tagging_loss=0.007611, over 15537.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.09023, pruned_loss=0.01208, audio_tagging_loss=0.008573, over 3056426.59 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:57:07,279 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.454e+01 9.078e+01 9.747e+01 1.050e+02 1.632e+02, threshold=1.949e+02, percent-clipped=0.0 2023-11-28 14:57:07,908 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.42 vs. limit=15.0 2023-11-28 14:57:28,982 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.53 vs. limit=15.0 2023-11-28 14:57:32,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3552480.0, ans=0.0 2023-11-28 14:57:34,914 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3552546.6666666665, ans=0.125 2023-11-28 14:57:50,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3552613.3333333335, ans=0.125 2023-11-28 14:57:56,412 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 532900 2023-11-28 14:57:59,257 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 14:58:01,930 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 3850, loss[loss=0.07721, simple_loss=0.1034, pruned_loss=0.01709, audio_tagging_loss=0.008403, over 15219.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.09091, pruned_loss=0.01227, audio_tagging_loss=0.008544, over 3052515.96 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:58:31,171 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3552813.3333333335, ans=0.1 2023-11-28 14:58:33,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3552813.3333333335, ans=0.1 2023-11-28 14:58:43,942 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-28 14:58:50,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=3552880.0, ans=15.0 2023-11-28 14:58:51,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3552946.6666666665, ans=0.0 2023-11-28 14:58:59,531 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 532950 2023-11-28 14:59:04,039 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 3900, loss[loss=0.05943, simple_loss=0.08554, pruned_loss=0.01047, audio_tagging_loss=0.006192, over 13420.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.09006, pruned_loss=0.01218, audio_tagging_loss=0.008635, over 3051399.26 frames. ], batch size: 52, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:59:12,239 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.538e+01 8.774e+01 9.422e+01 1.004e+02 1.392e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-28 14:59:17,656 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3553080.0, ans=0.0 2023-11-28 14:59:29,201 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3553146.6666666665, ans=0.125 2023-11-28 14:59:41,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3553213.3333333335, ans=0.07 2023-11-28 14:59:53,122 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3553280.0, ans=0.1 2023-11-28 14:59:53,178 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3553280.0, ans=0.2 2023-11-28 14:59:59,782 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 533000 2023-11-28 14:59:59,905 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3553280.0, ans=0.05 2023-11-28 15:00:05,290 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.14 vs. limit=22.5 2023-11-28 15:00:05,526 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 3950, loss[loss=0.06878, simple_loss=0.09178, pruned_loss=0.01444, audio_tagging_loss=0.008441, over 15572.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.09053, pruned_loss=0.01229, audio_tagging_loss=0.008789, over 3051919.73 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:00:36,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3553480.0, ans=0.125 2023-11-28 15:00:37,500 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3553480.0, ans=0.2 2023-11-28 15:01:00,256 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 533050 2023-11-28 15:01:04,970 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 4000, loss[loss=0.05648, simple_loss=0.07214, pruned_loss=0.01094, audio_tagging_loss=0.009468, over 13785.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.09026, pruned_loss=0.01204, audio_tagging_loss=0.008791, over 3051014.84 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:01:07,915 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3553680.0, ans=0.2 2023-11-28 15:01:13,358 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.093e+01 9.021e+01 9.610e+01 1.042e+02 1.658e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-28 15:01:31,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3553813.3333333335, ans=0.125 2023-11-28 15:01:43,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3553880.0, ans=0.125 2023-11-28 15:01:49,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3553880.0, ans=0.125 2023-11-28 15:02:00,875 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 533100 2023-11-28 15:02:02,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3553946.6666666665, ans=0.125 2023-11-28 15:02:05,828 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 4050, loss[loss=0.06728, simple_loss=0.09351, pruned_loss=0.009681, audio_tagging_loss=0.01084, over 15534.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.09053, pruned_loss=0.01218, audio_tagging_loss=0.008842, over 3048923.46 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:02:10,547 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 15:02:21,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3554080.0, ans=0.125 2023-11-28 15:02:41,297 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3554213.3333333335, ans=0.125 2023-11-28 15:03:01,399 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 533150 2023-11-28 15:03:06,495 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 4100, loss[loss=0.06161, simple_loss=0.08034, pruned_loss=0.0134, audio_tagging_loss=0.008038, over 15037.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.09024, pruned_loss=0.01221, audio_tagging_loss=0.00885, over 3053639.37 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:03:13,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3554346.6666666665, ans=0.1 2023-11-28 15:03:16,450 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.261e+01 8.818e+01 9.455e+01 1.012e+02 1.204e+02, threshold=1.891e+02, percent-clipped=0.0 2023-11-28 15:03:17,135 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.47 vs. limit=15.0 2023-11-28 15:03:22,654 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3554413.3333333335, ans=0.0 2023-11-28 15:03:26,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3554413.3333333335, ans=0.0 2023-11-28 15:03:42,998 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3554546.6666666665, ans=0.125 2023-11-28 15:03:49,075 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.18 vs. limit=10.0 2023-11-28 15:03:58,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3554613.3333333335, ans=0.1 2023-11-28 15:04:01,652 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 533200 2023-11-28 15:04:06,996 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 4150, loss[loss=0.07832, simple_loss=0.111, pruned_loss=0.01471, audio_tagging_loss=0.008108, over 15069.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.09042, pruned_loss=0.01225, audio_tagging_loss=0.008627, over 3058924.03 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:04:28,768 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3554746.6666666665, ans=0.125 2023-11-28 15:04:29,266 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.47 vs. limit=15.0 2023-11-28 15:04:49,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3554880.0, ans=0.125 2023-11-28 15:04:51,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3554880.0, ans=0.125 2023-11-28 15:04:52,379 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 15:04:59,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3554946.6666666665, ans=0.125 2023-11-28 15:05:01,837 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 533250 2023-11-28 15:05:03,039 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3554946.6666666665, ans=0.0 2023-11-28 15:05:06,749 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 4200, loss[loss=0.06239, simple_loss=0.09392, pruned_loss=0.008758, audio_tagging_loss=0.006674, over 15689.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.09051, pruned_loss=0.01222, audio_tagging_loss=0.008537, over 3068734.94 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:05:08,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3555013.3333333335, ans=0.1 2023-11-28 15:05:10,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3555013.3333333335, ans=0.2 2023-11-28 15:05:15,738 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.383e+01 8.572e+01 9.503e+01 1.029e+02 1.274e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-28 15:05:17,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3555080.0, ans=0.1 2023-11-28 15:05:18,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3555080.0, ans=0.125 2023-11-28 15:05:53,401 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.26 vs. limit=22.5 2023-11-28 15:05:56,559 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3555280.0, ans=0.5 2023-11-28 15:05:57,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3555280.0, ans=0.0 2023-11-28 15:06:00,941 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 533300 2023-11-28 15:06:05,322 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 4250, loss[loss=0.09732, simple_loss=0.1278, pruned_loss=0.02725, audio_tagging_loss=0.006156, over 15717.00 frames. ], tot_loss[loss=0.06645, simple_loss=0.09096, pruned_loss=0.01246, audio_tagging_loss=0.008503, over 3061075.28 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:06:13,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3555346.6666666665, ans=0.1 2023-11-28 15:06:15,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=3555346.6666666665, ans=15.0 2023-11-28 15:06:18,477 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.09 vs. limit=22.5 2023-11-28 15:06:25,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3555413.3333333335, ans=0.0 2023-11-28 15:06:41,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3555546.6666666665, ans=0.125 2023-11-28 15:06:44,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3555546.6666666665, ans=0.2 2023-11-28 15:06:58,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3555613.3333333335, ans=0.2 2023-11-28 15:07:00,047 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 533350 2023-11-28 15:07:04,456 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 4300, loss[loss=0.06661, simple_loss=0.09542, pruned_loss=0.01149, audio_tagging_loss=0.007403, over 15598.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.09021, pruned_loss=0.01223, audio_tagging_loss=0.008494, over 3050759.55 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:07:13,806 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.627e+01 9.132e+01 9.735e+01 1.057e+02 1.337e+02, threshold=1.947e+02, percent-clipped=0.0 2023-11-28 15:07:31,706 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.74 vs. limit=15.0 2023-11-28 15:07:37,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3555813.3333333335, ans=0.125 2023-11-28 15:07:58,893 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 533400 2023-11-28 15:08:03,640 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 4350, loss[loss=0.08529, simple_loss=0.1224, pruned_loss=0.0179, audio_tagging_loss=0.006184, over 14977.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.09107, pruned_loss=0.01223, audio_tagging_loss=0.008408, over 3040329.84 frames. ], batch size: 53, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:08:27,296 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3556146.6666666665, ans=0.5 2023-11-28 15:08:31,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3556146.6666666665, ans=0.125 2023-11-28 15:08:43,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3556213.3333333335, ans=0.0 2023-11-28 15:08:57,868 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 533450 2023-11-28 15:08:59,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3556280.0, ans=0.125 2023-11-28 15:09:02,406 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 4400, loss[loss=0.06023, simple_loss=0.08718, pruned_loss=0.01059, audio_tagging_loss=0.006042, over 15883.00 frames. ], tot_loss[loss=0.06602, simple_loss=0.09055, pruned_loss=0.01233, audio_tagging_loss=0.00841, over 3034280.03 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:09:07,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3556346.6666666665, ans=0.0 2023-11-28 15:09:12,179 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.392e+01 8.943e+01 9.727e+01 1.045e+02 1.586e+02, threshold=1.945e+02, percent-clipped=0.0 2023-11-28 15:10:02,218 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 533500 2023-11-28 15:10:02,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3556613.3333333335, ans=0.125 2023-11-28 15:10:06,992 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 4450, loss[loss=0.06419, simple_loss=0.09222, pruned_loss=0.009616, audio_tagging_loss=0.008457, over 16150.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.09099, pruned_loss=0.0123, audio_tagging_loss=0.008371, over 3046339.62 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:10:22,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3556746.6666666665, ans=0.1 2023-11-28 15:10:27,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3556746.6666666665, ans=0.0 2023-11-28 15:11:02,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3556946.6666666665, ans=0.125 2023-11-28 15:11:03,165 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff2.min_abs, batch_count=3556946.6666666665, ans=0.1 2023-11-28 15:11:03,871 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.58 vs. limit=15.0 2023-11-28 15:11:06,070 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 533550 2023-11-28 15:11:10,946 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 4500, loss[loss=0.04694, simple_loss=0.06335, pruned_loss=0.005839, audio_tagging_loss=0.009427, over 16049.00 frames. ], tot_loss[loss=0.06653, simple_loss=0.09126, pruned_loss=0.01249, audio_tagging_loss=0.00841, over 3044483.48 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:11:18,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3557013.3333333335, ans=0.0 2023-11-28 15:11:21,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3557013.3333333335, ans=0.0 2023-11-28 15:11:22,375 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.702e+01 8.792e+01 9.317e+01 1.000e+02 1.287e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-28 15:11:51,646 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.91 vs. limit=22.5 2023-11-28 15:11:57,335 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.06 vs. limit=22.5 2023-11-28 15:12:10,878 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 533600 2023-11-28 15:12:15,974 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 4550, loss[loss=0.05449, simple_loss=0.07762, pruned_loss=0.008402, audio_tagging_loss=0.007285, over 13660.00 frames. ], tot_loss[loss=0.06499, simple_loss=0.08896, pruned_loss=0.01207, audio_tagging_loss=0.008441, over 3034031.86 frames. ], batch size: 53, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:12:26,060 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.30 vs. limit=6.0 2023-11-28 15:12:27,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3557413.3333333335, ans=0.1 2023-11-28 15:12:53,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3557546.6666666665, ans=0.125 2023-11-28 15:12:58,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3557546.6666666665, ans=0.125 2023-11-28 15:13:06,984 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 15:13:14,942 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 533650 2023-11-28 15:13:19,726 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 4600, loss[loss=0.06536, simple_loss=0.09902, pruned_loss=0.01044, audio_tagging_loss=0.005411, over 15733.00 frames. ], tot_loss[loss=0.06515, simple_loss=0.0889, pruned_loss=0.01213, audio_tagging_loss=0.008566, over 3044256.52 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:13:29,187 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.370e+01 8.928e+01 9.397e+01 1.022e+02 1.415e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-28 15:13:32,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3557746.6666666665, ans=0.0 2023-11-28 15:13:32,860 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.41 vs. limit=15.0 2023-11-28 15:13:38,683 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3557746.6666666665, ans=0.125 2023-11-28 15:13:43,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3557746.6666666665, ans=0.125 2023-11-28 15:13:43,305 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.09 vs. limit=22.5 2023-11-28 15:13:44,692 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3557813.3333333335, ans=0.05 2023-11-28 15:14:16,133 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.29 vs. limit=6.0 2023-11-28 15:14:17,658 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 533700 2023-11-28 15:14:22,823 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 4650, loss[loss=0.05252, simple_loss=0.06607, pruned_loss=0.01074, audio_tagging_loss=0.008746, over 14810.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08931, pruned_loss=0.01227, audio_tagging_loss=0.008656, over 3048384.77 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:14:25,632 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.14 vs. limit=15.0 2023-11-28 15:14:54,770 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.81 vs. limit=12.0 2023-11-28 15:14:59,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3558146.6666666665, ans=0.125 2023-11-28 15:15:00,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3558213.3333333335, ans=0.125 2023-11-28 15:15:08,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3558213.3333333335, ans=0.125 2023-11-28 15:15:13,301 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.50 vs. limit=12.0 2023-11-28 15:15:21,839 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 533750 2023-11-28 15:15:27,160 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 4700, loss[loss=0.06082, simple_loss=0.07722, pruned_loss=0.01329, audio_tagging_loss=0.008925, over 16261.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.08963, pruned_loss=0.01231, audio_tagging_loss=0.008728, over 3053057.81 frames. ], batch size: 64, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:15:36,914 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3558346.6666666665, ans=0.07 2023-11-28 15:15:38,958 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.846e+01 8.852e+01 9.419e+01 1.023e+02 1.642e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-28 15:15:48,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3558413.3333333335, ans=0.125 2023-11-28 15:15:50,396 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-28 15:15:50,729 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.62 vs. limit=15.0 2023-11-28 15:15:51,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3558480.0, ans=0.2 2023-11-28 15:16:01,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3558480.0, ans=0.125 2023-11-28 15:16:04,169 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3558546.6666666665, ans=0.125 2023-11-28 15:16:20,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3558613.3333333335, ans=0.1 2023-11-28 15:16:22,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3558613.3333333335, ans=0.125 2023-11-28 15:16:26,094 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 533800 2023-11-28 15:16:31,104 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 4750, loss[loss=0.05723, simple_loss=0.06871, pruned_loss=0.01209, audio_tagging_loss=0.01079, over 15719.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.0906, pruned_loss=0.01245, audio_tagging_loss=0.008813, over 3049943.45 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:16:34,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3558680.0, ans=0.0 2023-11-28 15:16:38,768 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3558680.0, ans=0.125 2023-11-28 15:17:01,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3558813.3333333335, ans=0.1 2023-11-28 15:17:14,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3558880.0, ans=0.0 2023-11-28 15:17:29,151 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 533850 2023-11-28 15:17:34,721 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 4800, loss[loss=0.07118, simple_loss=0.09521, pruned_loss=0.01564, audio_tagging_loss=0.007938, over 15561.00 frames. ], tot_loss[loss=0.06649, simple_loss=0.09049, pruned_loss=0.01235, audio_tagging_loss=0.008896, over 3054144.47 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:17:45,866 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.525e+01 9.221e+01 9.663e+01 1.040e+02 1.234e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-28 15:17:46,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3559080.0, ans=0.125 2023-11-28 15:17:46,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3559080.0, ans=0.0 2023-11-28 15:18:08,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3559146.6666666665, ans=0.2 2023-11-28 15:18:31,401 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 533900 2023-11-28 15:18:31,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3559280.0, ans=0.1 2023-11-28 15:18:34,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3559280.0, ans=0.1 2023-11-28 15:18:36,127 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 4850, loss[loss=0.08139, simple_loss=0.1145, pruned_loss=0.01599, audio_tagging_loss=0.008162, over 14649.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.09043, pruned_loss=0.01228, audio_tagging_loss=0.008847, over 3057783.32 frames. ], batch size: 54, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:18:41,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3559346.6666666665, ans=0.125 2023-11-28 15:18:50,650 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.78 vs. limit=10.0 2023-11-28 15:19:00,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3559480.0, ans=0.125 2023-11-28 15:19:02,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3559480.0, ans=0.1 2023-11-28 15:19:13,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3559546.6666666665, ans=0.1 2023-11-28 15:19:25,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3559613.3333333335, ans=0.0 2023-11-28 15:19:33,336 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 533950 2023-11-28 15:19:33,498 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3559613.3333333335, ans=0.2 2023-11-28 15:19:38,542 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 4900, loss[loss=0.07111, simple_loss=0.0949, pruned_loss=0.01596, audio_tagging_loss=0.007701, over 15371.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.09028, pruned_loss=0.01221, audio_tagging_loss=0.008852, over 3055532.44 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:19:43,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3559680.0, ans=0.07 2023-11-28 15:19:49,249 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.527e+01 9.026e+01 9.693e+01 1.038e+02 1.259e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-28 15:20:00,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3559746.6666666665, ans=0.1 2023-11-28 15:20:08,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3559813.3333333335, ans=0.125 2023-11-28 15:20:35,503 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 534000 2023-11-28 15:20:40,394 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 4950, loss[loss=0.06139, simple_loss=0.08126, pruned_loss=0.01112, audio_tagging_loss=0.009637, over 15416.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.09027, pruned_loss=0.01238, audio_tagging_loss=0.008761, over 3048305.71 frames. ], batch size: 61, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:20:41,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3560013.3333333335, ans=0.0 2023-11-28 15:20:49,732 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.42 vs. limit=6.0 2023-11-28 15:21:01,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3560080.0, ans=0.125 2023-11-28 15:21:05,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3560146.6666666665, ans=0.0 2023-11-28 15:21:06,354 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3560146.6666666665, ans=0.07 2023-11-28 15:21:21,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3560213.3333333335, ans=0.2 2023-11-28 15:21:25,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3560213.3333333335, ans=0.0 2023-11-28 15:21:34,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3560280.0, ans=0.125 2023-11-28 15:21:37,572 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 534050 2023-11-28 15:21:39,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3560280.0, ans=0.2 2023-11-28 15:21:42,831 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 5000, loss[loss=0.06709, simple_loss=0.0874, pruned_loss=0.01318, audio_tagging_loss=0.01021, over 15674.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.08977, pruned_loss=0.01227, audio_tagging_loss=0.008579, over 3036820.42 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:21:53,310 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.494e+01 8.823e+01 9.586e+01 1.031e+02 1.320e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-28 15:21:57,473 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3560413.3333333335, ans=0.025 2023-11-28 15:22:09,362 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3560480.0, ans=10.0 2023-11-28 15:22:12,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3560480.0, ans=0.0 2023-11-28 15:22:30,437 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3560546.6666666665, ans=0.0 2023-11-28 15:22:38,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3560613.3333333335, ans=0.125 2023-11-28 15:22:39,893 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 534100 2023-11-28 15:22:40,413 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.70 vs. limit=22.5 2023-11-28 15:22:42,667 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.54 vs. limit=15.0 2023-11-28 15:22:45,173 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 5050, loss[loss=0.06664, simple_loss=0.08835, pruned_loss=0.01328, audio_tagging_loss=0.009182, over 15383.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.08981, pruned_loss=0.01232, audio_tagging_loss=0.008437, over 3032586.43 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:22:53,735 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3560680.0, ans=0.125 2023-11-28 15:22:58,365 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3560746.6666666665, ans=0.125 2023-11-28 15:23:15,554 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3560813.3333333335, ans=0.125 2023-11-28 15:23:41,515 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 534150 2023-11-28 15:23:46,158 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 5100, loss[loss=0.06868, simple_loss=0.08617, pruned_loss=0.01526, audio_tagging_loss=0.01034, over 15821.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.08991, pruned_loss=0.01234, audio_tagging_loss=0.008459, over 3036039.01 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:23:58,684 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.559e+01 9.022e+01 9.569e+01 1.030e+02 1.259e+02, threshold=1.914e+02, percent-clipped=0.0 2023-11-28 15:24:15,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3561146.6666666665, ans=0.0 2023-11-28 15:24:43,578 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.69 vs. limit=22.5 2023-11-28 15:24:43,895 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 534200 2023-11-28 15:24:48,758 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 5150, loss[loss=0.09346, simple_loss=0.1398, pruned_loss=0.01841, audio_tagging_loss=0.005156, over 15712.00 frames. ], tot_loss[loss=0.06538, simple_loss=0.08913, pruned_loss=0.01226, audio_tagging_loss=0.008555, over 3036659.17 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:25:03,981 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.21 vs. limit=10.0 2023-11-28 15:25:33,932 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 15:25:46,202 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 534250 2023-11-28 15:25:50,876 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 5200, loss[loss=0.05152, simple_loss=0.06651, pruned_loss=0.009743, audio_tagging_loss=0.008521, over 15792.00 frames. ], tot_loss[loss=0.06462, simple_loss=0.08814, pruned_loss=0.01203, audio_tagging_loss=0.008518, over 3031533.86 frames. ], batch size: 61, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:26:00,784 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3561680.0, ans=0.1 2023-11-28 15:26:03,809 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.286e+01 8.561e+01 9.249e+01 1.010e+02 1.176e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-28 15:26:16,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3561813.3333333335, ans=0.125 2023-11-28 15:26:48,841 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 534300 2023-11-28 15:26:53,365 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 5250, loss[loss=0.07038, simple_loss=0.09877, pruned_loss=0.01348, audio_tagging_loss=0.007515, over 15757.00 frames. ], tot_loss[loss=0.06479, simple_loss=0.08827, pruned_loss=0.0121, audio_tagging_loss=0.008555, over 3036862.37 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:26:53,694 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3562013.3333333335, ans=0.1 2023-11-28 15:27:16,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3562080.0, ans=0.125 2023-11-28 15:27:43,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3562280.0, ans=0.125 2023-11-28 15:27:50,508 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 534350 2023-11-28 15:27:53,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3562280.0, ans=0.125 2023-11-28 15:27:55,103 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 5300, loss[loss=0.07252, simple_loss=0.1013, pruned_loss=0.01597, audio_tagging_loss=0.005874, over 15926.00 frames. ], tot_loss[loss=0.06497, simple_loss=0.08878, pruned_loss=0.01214, audio_tagging_loss=0.008444, over 3036972.74 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:28:07,536 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.489e+01 8.987e+01 9.599e+01 1.032e+02 1.281e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-28 15:28:11,842 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.56 vs. limit=15.0 2023-11-28 15:28:27,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3562480.0, ans=0.125 2023-11-28 15:28:39,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3562546.6666666665, ans=0.2 2023-11-28 15:28:43,018 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3562546.6666666665, ans=0.0 2023-11-28 15:28:43,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3562613.3333333335, ans=0.125 2023-11-28 15:28:49,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3562613.3333333335, ans=0.125 2023-11-28 15:28:52,883 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 534400 2023-11-28 15:28:57,891 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 5350, loss[loss=0.05168, simple_loss=0.07047, pruned_loss=0.00798, audio_tagging_loss=0.008466, over 15301.00 frames. ], tot_loss[loss=0.06511, simple_loss=0.08913, pruned_loss=0.01209, audio_tagging_loss=0.008456, over 3046376.51 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:29:13,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3562746.6666666665, ans=0.125 2023-11-28 15:29:31,785 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.32 vs. limit=12.0 2023-11-28 15:29:55,299 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 534450 2023-11-28 15:29:55,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3562946.6666666665, ans=0.1 2023-11-28 15:29:59,990 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 5400, loss[loss=0.05365, simple_loss=0.07061, pruned_loss=0.008963, audio_tagging_loss=0.009383, over 16053.00 frames. ], tot_loss[loss=0.0649, simple_loss=0.08888, pruned_loss=0.01201, audio_tagging_loss=0.008453, over 3041002.14 frames. ], batch size: 63, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:30:00,213 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3563013.3333333335, ans=0.0 2023-11-28 15:30:14,152 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.799e+01 8.988e+01 9.532e+01 1.029e+02 1.170e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-28 15:30:26,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3563146.6666666665, ans=0.125 2023-11-28 15:30:39,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3563213.3333333335, ans=0.2 2023-11-28 15:30:57,517 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 534500 2023-11-28 15:31:02,828 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 5450, loss[loss=0.06065, simple_loss=0.08463, pruned_loss=0.01059, audio_tagging_loss=0.007745, over 14450.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08994, pruned_loss=0.01215, audio_tagging_loss=0.008531, over 3049425.32 frames. ], batch size: 54, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:31:05,383 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3563346.6666666665, ans=0.0 2023-11-28 15:31:06,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3563346.6666666665, ans=0.125 2023-11-28 15:31:21,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3563413.3333333335, ans=0.125 2023-11-28 15:31:58,485 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.23 vs. limit=15.0 2023-11-28 15:32:00,294 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 534550 2023-11-28 15:32:04,882 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 5500, loss[loss=0.06235, simple_loss=0.08975, pruned_loss=0.009237, audio_tagging_loss=0.008238, over 15116.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.09054, pruned_loss=0.01231, audio_tagging_loss=0.008597, over 3051615.43 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:32:08,646 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3563680.0, ans=0.0 2023-11-28 15:32:11,760 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3563680.0, ans=0.0 2023-11-28 15:32:15,107 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3563680.0, ans=0.0 2023-11-28 15:32:16,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3563746.6666666665, ans=0.0 2023-11-28 15:32:18,309 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.730e+01 8.919e+01 9.709e+01 1.036e+02 2.693e+02, threshold=1.942e+02, percent-clipped=1.0 2023-11-28 15:32:33,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3563813.3333333335, ans=0.125 2023-11-28 15:32:37,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3563813.3333333335, ans=0.125 2023-11-28 15:32:38,934 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.60 vs. limit=15.0 2023-11-28 15:32:46,478 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.18 vs. limit=6.0 2023-11-28 15:32:52,514 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.72 vs. limit=15.0 2023-11-28 15:32:56,251 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.58 vs. limit=12.0 2023-11-28 15:33:01,952 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 534600 2023-11-28 15:33:06,890 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 5550, loss[loss=0.05177, simple_loss=0.07383, pruned_loss=0.006253, audio_tagging_loss=0.008602, over 14356.00 frames. ], tot_loss[loss=0.06596, simple_loss=0.09012, pruned_loss=0.0122, audio_tagging_loss=0.008692, over 3053317.60 frames. ], batch size: 53, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:33:17,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3564080.0, ans=0.2 2023-11-28 15:33:29,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3564080.0, ans=0.125 2023-11-28 15:33:34,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3564146.6666666665, ans=0.0 2023-11-28 15:33:42,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3564146.6666666665, ans=0.0 2023-11-28 15:33:48,117 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.98 vs. limit=15.0 2023-11-28 15:33:54,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3564213.3333333335, ans=0.125 2023-11-28 15:34:02,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3564280.0, ans=0.2 2023-11-28 15:34:03,896 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 534650 2023-11-28 15:34:09,194 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 5600, loss[loss=0.06815, simple_loss=0.09161, pruned_loss=0.01442, audio_tagging_loss=0.007922, over 16710.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.09037, pruned_loss=0.01229, audio_tagging_loss=0.008706, over 3041374.40 frames. ], batch size: 62, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:34:10,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3564346.6666666665, ans=0.0 2023-11-28 15:34:23,317 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.833e+01 8.970e+01 9.641e+01 1.032e+02 1.547e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-28 15:34:24,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3564413.3333333335, ans=0.2 2023-11-28 15:34:40,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3564480.0, ans=0.1 2023-11-28 15:34:43,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3564480.0, ans=0.1 2023-11-28 15:34:49,089 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.45 vs. limit=22.5 2023-11-28 15:34:56,226 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 15:35:03,124 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3564613.3333333335, ans=0.04949747468305833 2023-11-28 15:35:07,003 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 534700 2023-11-28 15:35:11,598 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 5650, loss[loss=0.05807, simple_loss=0.07679, pruned_loss=0.007818, audio_tagging_loss=0.01186, over 14605.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.08973, pruned_loss=0.01223, audio_tagging_loss=0.008841, over 3044811.30 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:35:32,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3564746.6666666665, ans=0.1 2023-11-28 15:35:43,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3564813.3333333335, ans=0.125 2023-11-28 15:36:09,227 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 534750 2023-11-28 15:36:10,976 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.14 vs. limit=15.0 2023-11-28 15:36:11,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3564946.6666666665, ans=0.125 2023-11-28 15:36:13,948 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 5700, loss[loss=0.07984, simple_loss=0.1039, pruned_loss=0.01881, audio_tagging_loss=0.00908, over 14855.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.09037, pruned_loss=0.0123, audio_tagging_loss=0.008857, over 3048762.13 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:36:23,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3565013.3333333335, ans=0.0 2023-11-28 15:36:25,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3565080.0, ans=0.125 2023-11-28 15:36:28,526 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.604e+01 8.736e+01 9.435e+01 1.001e+02 1.261e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-28 15:36:30,789 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3565080.0, ans=0.125 2023-11-28 15:36:34,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3565080.0, ans=0.125 2023-11-28 15:36:45,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3565146.6666666665, ans=0.125 2023-11-28 15:36:48,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3565146.6666666665, ans=0.125 2023-11-28 15:37:05,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3565280.0, ans=0.125 2023-11-28 15:37:11,443 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 534800 2023-11-28 15:37:16,423 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 5750, loss[loss=0.05796, simple_loss=0.07576, pruned_loss=0.009136, audio_tagging_loss=0.01095, over 14530.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08949, pruned_loss=0.01203, audio_tagging_loss=0.008766, over 3054090.28 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:37:18,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3565346.6666666665, ans=0.125 2023-11-28 15:37:32,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3565413.3333333335, ans=0.125 2023-11-28 15:37:37,645 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.48 vs. limit=10.0 2023-11-28 15:37:58,193 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff3.min_abs, batch_count=3565546.6666666665, ans=0.2 2023-11-28 15:38:12,161 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3565613.3333333335, ans=0.2 2023-11-28 15:38:14,288 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 534850 2023-11-28 15:38:20,329 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 5800, loss[loss=0.0507, simple_loss=0.06407, pruned_loss=0.009972, audio_tagging_loss=0.008691, over 13960.00 frames. ], tot_loss[loss=0.06489, simple_loss=0.08863, pruned_loss=0.01194, audio_tagging_loss=0.008629, over 3046089.49 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:38:35,152 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.486e+01 8.604e+01 9.339e+01 1.000e+02 1.373e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-28 15:39:00,906 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.07 vs. limit=22.5 2023-11-28 15:39:10,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3565946.6666666665, ans=0.1 2023-11-28 15:39:17,071 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 534900 2023-11-28 15:39:22,236 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 5850, loss[loss=0.05599, simple_loss=0.07935, pruned_loss=0.008249, audio_tagging_loss=0.008065, over 15420.00 frames. ], tot_loss[loss=0.06487, simple_loss=0.08859, pruned_loss=0.01197, audio_tagging_loss=0.008607, over 3042423.25 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:39:27,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3566013.3333333335, ans=0.2 2023-11-28 15:39:40,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3566080.0, ans=0.125 2023-11-28 15:39:44,562 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3566080.0, ans=0.0 2023-11-28 15:39:48,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3566146.6666666665, ans=0.125 2023-11-28 15:39:52,721 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.03 vs. limit=15.0 2023-11-28 15:40:00,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3566213.3333333335, ans=0.04949747468305833 2023-11-28 15:40:17,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3566280.0, ans=0.125 2023-11-28 15:40:19,312 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 534950 2023-11-28 15:40:24,067 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 5900, loss[loss=0.06481, simple_loss=0.09882, pruned_loss=0.009044, audio_tagging_loss=0.006359, over 15358.00 frames. ], tot_loss[loss=0.06511, simple_loss=0.08899, pruned_loss=0.01206, audio_tagging_loss=0.008554, over 3047988.99 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:40:27,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3566346.6666666665, ans=0.125 2023-11-28 15:40:29,786 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.75 vs. limit=22.5 2023-11-28 15:40:34,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3566346.6666666665, ans=0.125 2023-11-28 15:40:39,350 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.038e+01 9.170e+01 9.658e+01 1.028e+02 1.325e+02, threshold=1.932e+02, percent-clipped=0.0 2023-11-28 15:40:40,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3566413.3333333335, ans=0.1 2023-11-28 15:40:44,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3566413.3333333335, ans=0.0 2023-11-28 15:40:58,174 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.32 vs. limit=10.0 2023-11-28 15:41:11,156 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.94 vs. limit=6.0 2023-11-28 15:41:18,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3566613.3333333335, ans=0.0 2023-11-28 15:41:21,375 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 535000 2023-11-28 15:41:26,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3566680.0, ans=0.1 2023-11-28 15:41:26,673 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.99 vs. limit=6.0 2023-11-28 15:41:26,969 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 5950, loss[loss=0.04165, simple_loss=0.05259, pruned_loss=0.00702, audio_tagging_loss=0.008333, over 14692.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.0887, pruned_loss=0.01214, audio_tagging_loss=0.008626, over 3047479.44 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:41:27,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3566680.0, ans=0.2 2023-11-28 15:41:36,165 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=3566680.0, ans=0.1 2023-11-28 15:41:48,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3566746.6666666665, ans=0.125 2023-11-28 15:42:05,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3566880.0, ans=0.125 2023-11-28 15:42:13,736 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.46 vs. limit=15.0 2023-11-28 15:42:24,356 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 535050 2023-11-28 15:42:29,571 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 6000, loss[loss=0.06032, simple_loss=0.08074, pruned_loss=0.01242, audio_tagging_loss=0.00753, over 14425.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08895, pruned_loss=0.0121, audio_tagging_loss=0.008559, over 3045316.03 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:42:29,571 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-28 15:43:05,440 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.0660, 4.8990, 3.5702, 4.3565], device='cuda:1') 2023-11-28 15:43:07,308 INFO [train_asr.py:1267] (1/4) Epoch 45, validation: loss=0.05761, simple_loss=0.05049, pruned_loss=0.005188, audio_tagging_loss=0.02718, over 4681554.00 frames. 2023-11-28 15:43:07,309 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-28 15:43:18,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3567013.3333333335, ans=0.2 2023-11-28 15:43:21,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3567080.0, ans=0.125 2023-11-28 15:43:22,489 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.130e+01 8.761e+01 9.402e+01 1.021e+02 1.330e+02, threshold=1.880e+02, percent-clipped=0.0 2023-11-28 15:43:32,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3567146.6666666665, ans=0.0 2023-11-28 15:43:54,051 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 15:44:04,973 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 535100 2023-11-28 15:44:09,598 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 6050, loss[loss=0.07951, simple_loss=0.1041, pruned_loss=0.01829, audio_tagging_loss=0.00915, over 15831.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08976, pruned_loss=0.01231, audio_tagging_loss=0.008437, over 3045991.41 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:44:27,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3567413.3333333335, ans=0.0 2023-11-28 15:44:33,595 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3567480.0, ans=0.125 2023-11-28 15:44:33,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3567480.0, ans=0.2 2023-11-28 15:44:35,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3567480.0, ans=0.2 2023-11-28 15:44:39,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3567480.0, ans=0.125 2023-11-28 15:44:40,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3567480.0, ans=0.0 2023-11-28 15:44:55,409 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.49 vs. limit=15.0 2023-11-28 15:45:07,206 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 535150 2023-11-28 15:45:12,330 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 6100, loss[loss=0.0797, simple_loss=0.1058, pruned_loss=0.01717, audio_tagging_loss=0.009651, over 14842.00 frames. ], tot_loss[loss=0.0652, simple_loss=0.08913, pruned_loss=0.01216, audio_tagging_loss=0.008471, over 3046402.24 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:45:27,624 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.258e+01 8.963e+01 9.572e+01 1.025e+02 1.368e+02, threshold=1.914e+02, percent-clipped=0.0 2023-11-28 15:45:27,999 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 15:46:09,325 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 535200 2023-11-28 15:46:12,463 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.96 vs. limit=12.0 2023-11-28 15:46:14,250 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 6150, loss[loss=0.06977, simple_loss=0.09779, pruned_loss=0.01462, audio_tagging_loss=0.006252, over 14898.00 frames. ], tot_loss[loss=0.06493, simple_loss=0.08869, pruned_loss=0.01208, audio_tagging_loss=0.008504, over 3043442.42 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:46:30,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3568080.0, ans=0.125 2023-11-28 15:47:01,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3568213.3333333335, ans=0.125 2023-11-28 15:47:03,144 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3568280.0, ans=0.0 2023-11-28 15:47:11,755 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 535250 2023-11-28 15:47:13,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3568280.0, ans=0.04949747468305833 2023-11-28 15:47:17,036 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 6200, loss[loss=0.0416, simple_loss=0.05315, pruned_loss=0.006476, audio_tagging_loss=0.00855, over 15229.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.08859, pruned_loss=0.01213, audio_tagging_loss=0.008657, over 3043756.57 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:47:27,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3568413.3333333335, ans=0.125 2023-11-28 15:47:33,564 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.897e+01 8.984e+01 9.633e+01 1.042e+02 1.273e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-28 15:47:40,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3568480.0, ans=0.125 2023-11-28 15:47:40,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3568480.0, ans=0.125 2023-11-28 15:48:09,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3568613.3333333335, ans=0.1 2023-11-28 15:48:14,152 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 535300 2023-11-28 15:48:17,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3568680.0, ans=0.125 2023-11-28 15:48:19,444 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 6250, loss[loss=0.05453, simple_loss=0.07379, pruned_loss=0.01079, audio_tagging_loss=0.006843, over 14362.00 frames. ], tot_loss[loss=0.06452, simple_loss=0.08731, pruned_loss=0.01205, audio_tagging_loss=0.008815, over 3040734.89 frames. ], batch size: 53, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:48:33,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3568746.6666666665, ans=0.0 2023-11-28 15:48:36,914 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3568746.6666666665, ans=0.2 2023-11-28 15:48:51,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3568813.3333333335, ans=0.125 2023-11-28 15:48:53,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3568813.3333333335, ans=0.0 2023-11-28 15:48:54,521 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3568813.3333333335, ans=0.0 2023-11-28 15:48:56,900 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3568880.0, ans=0.125 2023-11-28 15:49:09,666 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.44 vs. limit=15.0 2023-11-28 15:49:16,930 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 535350 2023-11-28 15:49:20,441 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 15:49:21,430 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 6300, loss[loss=0.06718, simple_loss=0.08875, pruned_loss=0.01375, audio_tagging_loss=0.009059, over 14612.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08889, pruned_loss=0.01226, audio_tagging_loss=0.00882, over 3045327.98 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:49:26,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3569013.3333333335, ans=0.0 2023-11-28 15:49:26,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3569013.3333333335, ans=0.125 2023-11-28 15:49:38,091 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.480e+01 8.816e+01 9.438e+01 1.010e+02 1.307e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-28 15:49:41,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3569080.0, ans=0.0 2023-11-28 15:49:41,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3569080.0, ans=0.125 2023-11-28 15:49:44,138 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3569080.0, ans=0.125 2023-11-28 15:50:19,307 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 535400 2023-11-28 15:50:20,575 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 15:50:24,773 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 6350, loss[loss=0.06093, simple_loss=0.08428, pruned_loss=0.008699, audio_tagging_loss=0.01009, over 15996.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08873, pruned_loss=0.01216, audio_tagging_loss=0.008822, over 3048160.82 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:50:35,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3569413.3333333335, ans=0.0 2023-11-28 15:50:36,865 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3569413.3333333335, ans=0.0 2023-11-28 15:51:01,719 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3569546.6666666665, ans=0.125 2023-11-28 15:51:21,896 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 535450 2023-11-28 15:51:26,577 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 6400, loss[loss=0.06563, simple_loss=0.09237, pruned_loss=0.01231, audio_tagging_loss=0.007134, over 14791.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08879, pruned_loss=0.01211, audio_tagging_loss=0.008837, over 3049815.63 frames. ], batch size: 54, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:51:32,604 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.56 vs. limit=15.0 2023-11-28 15:51:43,575 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.073e+01 8.811e+01 9.428e+01 1.008e+02 1.163e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-28 15:51:56,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3569813.3333333335, ans=0.125 2023-11-28 15:52:01,727 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3569813.3333333335, ans=0.1 2023-11-28 15:52:19,705 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.11 vs. limit=15.0 2023-11-28 15:52:21,592 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3569946.6666666665, ans=0.125 2023-11-28 15:52:24,224 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.48 vs. limit=22.5 2023-11-28 15:52:24,900 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 535500 2023-11-28 15:52:30,154 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 6450, loss[loss=0.05628, simple_loss=0.07409, pruned_loss=0.008551, audio_tagging_loss=0.01069, over 16033.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08879, pruned_loss=0.01197, audio_tagging_loss=0.008929, over 3046125.15 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:52:32,948 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3570013.3333333335, ans=0.1 2023-11-28 15:52:44,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3570080.0, ans=0.1 2023-11-28 15:52:48,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3570080.0, ans=0.125 2023-11-28 15:52:53,431 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=15.57 vs. limit=15.0 2023-11-28 15:53:04,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3570146.6666666665, ans=0.125 2023-11-28 15:53:23,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3570280.0, ans=0.1 2023-11-28 15:53:28,154 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 535550 2023-11-28 15:53:32,796 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 6500, loss[loss=0.06159, simple_loss=0.08605, pruned_loss=0.01101, audio_tagging_loss=0.007554, over 15140.00 frames. ], tot_loss[loss=0.06502, simple_loss=0.08859, pruned_loss=0.01182, audio_tagging_loss=0.008909, over 3044691.68 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:53:37,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3570346.6666666665, ans=0.5 2023-11-28 15:53:48,895 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.726e+01 8.856e+01 9.321e+01 9.995e+01 1.237e+02, threshold=1.864e+02, percent-clipped=0.0 2023-11-28 15:54:01,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3570480.0, ans=0.125 2023-11-28 15:54:18,213 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3570546.6666666665, ans=0.1 2023-11-28 15:54:30,574 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 535600 2023-11-28 15:54:31,831 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3570613.3333333335, ans=0.0 2023-11-28 15:54:34,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3570680.0, ans=0.125 2023-11-28 15:54:34,800 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3570680.0, ans=0.0 2023-11-28 15:54:35,595 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 6550, loss[loss=0.05844, simple_loss=0.0853, pruned_loss=0.008934, audio_tagging_loss=0.00686, over 15402.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.089, pruned_loss=0.0121, audio_tagging_loss=0.008795, over 3045935.37 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:54:48,289 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3570746.6666666665, ans=0.125 2023-11-28 15:54:50,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3570746.6666666665, ans=0.0 2023-11-28 15:55:13,148 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.96 vs. limit=15.0 2023-11-28 15:55:26,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3570946.6666666665, ans=0.1 2023-11-28 15:55:27,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3570946.6666666665, ans=0.1 2023-11-28 15:55:33,419 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 535650 2023-11-28 15:55:38,053 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 6600, loss[loss=0.05028, simple_loss=0.07806, pruned_loss=0.003785, audio_tagging_loss=0.007466, over 14731.00 frames. ], tot_loss[loss=0.06515, simple_loss=0.08887, pruned_loss=0.01207, audio_tagging_loss=0.008644, over 3039470.42 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:55:55,573 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.560e+01 8.913e+01 9.644e+01 1.048e+02 1.369e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-28 15:55:59,027 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.31 vs. limit=10.0 2023-11-28 15:56:22,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3571213.3333333335, ans=0.125 2023-11-28 15:56:35,001 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 535700 2023-11-28 15:56:40,920 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 6650, loss[loss=0.06495, simple_loss=0.08886, pruned_loss=0.01291, audio_tagging_loss=0.007617, over 15397.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08937, pruned_loss=0.01217, audio_tagging_loss=0.008591, over 3037308.46 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:56:52,330 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3571413.3333333335, ans=0.125 2023-11-28 15:57:00,765 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.83 vs. limit=15.0 2023-11-28 15:57:11,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3571480.0, ans=0.0 2023-11-28 15:57:38,157 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 535750 2023-11-28 15:57:41,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3571680.0, ans=0.125 2023-11-28 15:57:42,800 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 6700, loss[loss=0.04602, simple_loss=0.05532, pruned_loss=0.009843, audio_tagging_loss=0.008512, over 15121.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08964, pruned_loss=0.01222, audio_tagging_loss=0.008551, over 3032664.47 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:57:59,991 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.409e+01 8.880e+01 9.649e+01 1.029e+02 1.368e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-28 15:58:37,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3571946.6666666665, ans=0.0 2023-11-28 15:58:39,988 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 535800 2023-11-28 15:58:45,043 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 6750, loss[loss=0.07095, simple_loss=0.1006, pruned_loss=0.01415, audio_tagging_loss=0.006489, over 15203.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.08973, pruned_loss=0.01222, audio_tagging_loss=0.008568, over 3030347.80 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:58:48,194 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.74 vs. limit=6.0 2023-11-28 15:59:16,133 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.61 vs. limit=15.0 2023-11-28 15:59:24,514 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.45 vs. limit=15.0 2023-11-28 15:59:24,519 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.04 vs. limit=10.0 2023-11-28 15:59:29,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3572213.3333333335, ans=0.125 2023-11-28 15:59:42,218 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 535850 2023-11-28 15:59:47,499 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 6800, loss[loss=0.04361, simple_loss=0.05244, pruned_loss=0.007801, audio_tagging_loss=0.009588, over 15947.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08961, pruned_loss=0.01219, audio_tagging_loss=0.008508, over 3032876.45 frames. ], batch size: 63, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:59:57,214 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.35 vs. limit=6.0 2023-11-28 15:59:59,181 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 16:00:03,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3572413.3333333335, ans=0.0 2023-11-28 16:00:04,703 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.844e+01 9.079e+01 9.608e+01 1.020e+02 1.284e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-28 16:00:06,150 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 16:00:09,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=3572413.3333333335, ans=15.0 2023-11-28 16:00:14,062 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 16:00:23,690 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.60 vs. limit=22.5 2023-11-28 16:00:37,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3572613.3333333335, ans=0.125 2023-11-28 16:00:40,600 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3572613.3333333335, ans=0.0 2023-11-28 16:00:45,851 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 535900 2023-11-28 16:00:50,607 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 6850, loss[loss=0.07738, simple_loss=0.1162, pruned_loss=0.01235, audio_tagging_loss=0.006911, over 15353.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.09061, pruned_loss=0.01216, audio_tagging_loss=0.008438, over 3034689.70 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:00:59,624 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.50 vs. limit=22.5 2023-11-28 16:01:09,346 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3572746.6666666665, ans=0.2 2023-11-28 16:01:12,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3572746.6666666665, ans=0.125 2023-11-28 16:01:21,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3572813.3333333335, ans=0.07 2023-11-28 16:01:35,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3572880.0, ans=0.125 2023-11-28 16:01:37,427 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3572880.0, ans=0.125 2023-11-28 16:01:46,823 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 535950 2023-11-28 16:01:52,223 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 6900, loss[loss=0.07362, simple_loss=0.09628, pruned_loss=0.01654, audio_tagging_loss=0.008946, over 15226.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.09109, pruned_loss=0.01217, audio_tagging_loss=0.00842, over 3040789.50 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:02:10,317 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.161e+01 8.778e+01 9.481e+01 1.016e+02 1.477e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-28 16:02:26,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3573146.6666666665, ans=0.125 2023-11-28 16:02:26,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3573146.6666666665, ans=0.0 2023-11-28 16:02:33,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3573213.3333333335, ans=0.0 2023-11-28 16:02:45,252 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 16:02:51,160 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 536000 2023-11-28 16:02:51,965 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.17 vs. limit=22.5 2023-11-28 16:02:58,320 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.33 vs. limit=15.0 2023-11-28 16:02:58,689 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 6950, loss[loss=0.06599, simple_loss=0.08951, pruned_loss=0.01384, audio_tagging_loss=0.007403, over 15895.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.09055, pruned_loss=0.01201, audio_tagging_loss=0.00849, over 3049704.49 frames. ], batch size: 61, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:03:11,838 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3573413.3333333335, ans=0.125 2023-11-28 16:03:18,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3573413.3333333335, ans=0.2 2023-11-28 16:03:19,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3573413.3333333335, ans=0.95 2023-11-28 16:03:23,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff2.min_abs, batch_count=3573480.0, ans=0.1 2023-11-28 16:03:31,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3573480.0, ans=0.2 2023-11-28 16:03:34,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3573480.0, ans=0.125 2023-11-28 16:03:56,626 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 536050 2023-11-28 16:04:01,258 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 7000, loss[loss=0.0575, simple_loss=0.07356, pruned_loss=0.009315, audio_tagging_loss=0.0114, over 14560.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.09017, pruned_loss=0.01191, audio_tagging_loss=0.008465, over 3052954.24 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:04:18,991 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.426e+01 8.731e+01 9.457e+01 1.025e+02 1.203e+02, threshold=1.891e+02, percent-clipped=0.0 2023-11-28 16:04:24,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3573746.6666666665, ans=0.125 2023-11-28 16:04:32,852 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.48 vs. limit=15.0 2023-11-28 16:04:58,739 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 536100 2023-11-28 16:05:03,924 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 7050, loss[loss=0.04994, simple_loss=0.05981, pruned_loss=0.008277, audio_tagging_loss=0.01176, over 15715.00 frames. ], tot_loss[loss=0.06456, simple_loss=0.08854, pruned_loss=0.0117, audio_tagging_loss=0.008583, over 3046750.78 frames. ], batch size: 62, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:05:11,366 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 16:05:12,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3574013.3333333335, ans=0.125 2023-11-28 16:05:26,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3574080.0, ans=0.125 2023-11-28 16:05:27,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3574146.6666666665, ans=0.125 2023-11-28 16:05:50,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3574213.3333333335, ans=0.0 2023-11-28 16:05:53,981 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3574280.0, ans=0.0 2023-11-28 16:06:01,023 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 536150 2023-11-28 16:06:03,701 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3574280.0, ans=0.125 2023-11-28 16:06:05,782 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 7100, loss[loss=0.09459, simple_loss=0.1391, pruned_loss=0.01994, audio_tagging_loss=0.00508, over 16149.00 frames. ], tot_loss[loss=0.06495, simple_loss=0.08885, pruned_loss=0.01185, audio_tagging_loss=0.008681, over 3054594.53 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:06:21,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3574413.3333333335, ans=0.025 2023-11-28 16:06:23,792 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.873e+01 8.973e+01 9.710e+01 1.043e+02 1.342e+02, threshold=1.942e+02, percent-clipped=0.0 2023-11-28 16:06:24,298 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.78 vs. limit=10.0 2023-11-28 16:06:30,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3574480.0, ans=0.0 2023-11-28 16:07:04,345 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 536200 2023-11-28 16:07:05,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3574613.3333333335, ans=0.1 2023-11-28 16:07:05,915 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.84 vs. limit=22.5 2023-11-28 16:07:09,630 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 7150, loss[loss=0.05551, simple_loss=0.0698, pruned_loss=0.01001, audio_tagging_loss=0.0106, over 15190.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08928, pruned_loss=0.01195, audio_tagging_loss=0.008772, over 3051207.19 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:07:14,955 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.83 vs. limit=15.0 2023-11-28 16:07:19,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3574680.0, ans=0.125 2023-11-28 16:08:07,035 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 536250 2023-11-28 16:08:07,218 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3574946.6666666665, ans=0.125 2023-11-28 16:08:12,293 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 7200, loss[loss=0.06506, simple_loss=0.09, pruned_loss=0.01035, audio_tagging_loss=0.00971, over 15087.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.08974, pruned_loss=0.01212, audio_tagging_loss=0.008818, over 3047824.57 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:08:12,517 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3575013.3333333335, ans=0.0 2023-11-28 16:08:21,669 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.53 vs. limit=12.0 2023-11-28 16:08:29,533 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.588e+01 8.876e+01 9.486e+01 1.031e+02 1.370e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-28 16:08:35,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3575080.0, ans=0.125 2023-11-28 16:08:42,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3575146.6666666665, ans=0.125 2023-11-28 16:09:10,437 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 536300 2023-11-28 16:09:15,000 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 7250, loss[loss=0.06898, simple_loss=0.09854, pruned_loss=0.01269, audio_tagging_loss=0.007021, over 13533.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.08999, pruned_loss=0.01229, audio_tagging_loss=0.008868, over 3050643.24 frames. ], batch size: 52, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:09:33,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3575413.3333333335, ans=0.0 2023-11-28 16:09:38,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3575480.0, ans=0.0 2023-11-28 16:10:02,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3575546.6666666665, ans=0.125 2023-11-28 16:10:03,201 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.10 vs. limit=15.0 2023-11-28 16:10:12,477 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 536350 2023-11-28 16:10:16,695 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.63 vs. limit=15.0 2023-11-28 16:10:17,742 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 7300, loss[loss=0.04774, simple_loss=0.06602, pruned_loss=0.006629, audio_tagging_loss=0.008106, over 16072.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.0893, pruned_loss=0.01217, audio_tagging_loss=0.00886, over 3043311.10 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:10:20,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3575680.0, ans=0.1 2023-11-28 16:10:22,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3575680.0, ans=0.125 2023-11-28 16:10:34,079 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.469e+01 8.874e+01 9.514e+01 1.009e+02 1.390e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-28 16:10:36,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3575746.6666666665, ans=0.125 2023-11-28 16:10:38,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3575746.6666666665, ans=0.125 2023-11-28 16:11:00,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3575880.0, ans=0.0 2023-11-28 16:11:14,337 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 536400 2023-11-28 16:11:17,518 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=3575946.6666666665, ans=15.0 2023-11-28 16:11:18,467 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3576013.3333333335, ans=0.05 2023-11-28 16:11:19,294 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 7350, loss[loss=0.04647, simple_loss=0.05768, pruned_loss=0.007946, audio_tagging_loss=0.009685, over 16411.00 frames. ], tot_loss[loss=0.06501, simple_loss=0.08861, pruned_loss=0.01196, audio_tagging_loss=0.008743, over 3043713.49 frames. ], batch size: 64, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:11:40,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3576080.0, ans=0.125 2023-11-28 16:11:41,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3576080.0, ans=0.0 2023-11-28 16:11:53,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3576146.6666666665, ans=0.0 2023-11-28 16:12:17,299 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 536450 2023-11-28 16:12:22,051 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 7400, loss[loss=0.0705, simple_loss=0.1039, pruned_loss=0.01261, audio_tagging_loss=0.00592, over 15111.00 frames. ], tot_loss[loss=0.06451, simple_loss=0.08803, pruned_loss=0.01188, audio_tagging_loss=0.008618, over 3045113.31 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:12:22,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3576346.6666666665, ans=0.2 2023-11-28 16:12:41,964 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.528e+01 8.849e+01 9.470e+01 1.002e+02 1.238e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-28 16:12:50,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3576480.0, ans=0.2 2023-11-28 16:12:58,308 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.34 vs. limit=10.0 2023-11-28 16:13:14,633 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.07 vs. limit=15.0 2023-11-28 16:13:19,437 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 536500 2023-11-28 16:13:23,310 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.89 vs. limit=15.0 2023-11-28 16:13:23,941 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 7450, loss[loss=0.08073, simple_loss=0.11, pruned_loss=0.01764, audio_tagging_loss=0.008086, over 15909.00 frames. ], tot_loss[loss=0.06495, simple_loss=0.08842, pruned_loss=0.01214, audio_tagging_loss=0.008602, over 3044479.17 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 8.0 2023-11-28 16:13:31,924 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3576680.0, ans=0.125 2023-11-28 16:13:37,730 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3576746.6666666665, ans=0.125 2023-11-28 16:13:57,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3576813.3333333335, ans=0.0 2023-11-28 16:14:01,905 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3576880.0, ans=0.0 2023-11-28 16:14:18,846 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=3576946.6666666665, ans=15.0 2023-11-28 16:14:21,974 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 536550 2023-11-28 16:14:26,733 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 7500, loss[loss=0.06554, simple_loss=0.08324, pruned_loss=0.01362, audio_tagging_loss=0.0103, over 13748.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08896, pruned_loss=0.01228, audio_tagging_loss=0.008502, over 3048531.91 frames. ], batch size: 54, lr: 1.50e-03, grad_scale: 8.0 2023-11-28 16:14:27,057 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3577013.3333333335, ans=0.125 2023-11-28 16:14:31,595 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3577013.3333333335, ans=0.125 2023-11-28 16:14:31,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3577013.3333333335, ans=0.1 2023-11-28 16:14:40,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3577080.0, ans=0.125 2023-11-28 16:14:47,283 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.773e+01 8.969e+01 9.602e+01 1.048e+02 1.798e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-28 16:14:50,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3577080.0, ans=0.125 2023-11-28 16:14:57,149 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.68 vs. limit=15.0 2023-11-28 16:15:10,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3577213.3333333335, ans=0.125 2023-11-28 16:15:19,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3577280.0, ans=0.125 2023-11-28 16:15:24,950 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 536600 2023-11-28 16:15:29,969 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 7550, loss[loss=0.06921, simple_loss=0.09021, pruned_loss=0.01693, audio_tagging_loss=0.007181, over 14700.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.0892, pruned_loss=0.0124, audio_tagging_loss=0.008458, over 3045387.29 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 8.0 2023-11-28 16:15:44,473 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3577413.3333333335, ans=0.0 2023-11-28 16:16:01,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3577480.0, ans=0.0 2023-11-28 16:16:14,895 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3577546.6666666665, ans=0.125 2023-11-28 16:16:18,910 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.20 vs. limit=15.0 2023-11-28 16:16:21,947 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.35 vs. limit=15.0 2023-11-28 16:16:25,212 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3577613.3333333335, ans=0.2 2023-11-28 16:16:27,773 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 536650 2023-11-28 16:16:31,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3577680.0, ans=0.125 2023-11-28 16:16:32,313 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 7600, loss[loss=0.05397, simple_loss=0.0727, pruned_loss=0.008932, audio_tagging_loss=0.008688, over 14909.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08883, pruned_loss=0.01233, audio_tagging_loss=0.008523, over 3048984.20 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:16:46,088 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3577746.6666666665, ans=0.1 2023-11-28 16:16:51,860 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.705e+01 8.772e+01 9.466e+01 1.004e+02 1.199e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-28 16:17:30,282 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 536700 2023-11-28 16:17:34,911 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 7650, loss[loss=0.07741, simple_loss=0.109, pruned_loss=0.01649, audio_tagging_loss=0.006414, over 16252.00 frames. ], tot_loss[loss=0.06489, simple_loss=0.08857, pruned_loss=0.01216, audio_tagging_loss=0.008442, over 3040285.71 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:18:01,543 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3578146.6666666665, ans=0.125 2023-11-28 16:18:02,007 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.49 vs. limit=6.0 2023-11-28 16:18:14,361 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3578213.3333333335, ans=0.2 2023-11-28 16:18:16,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3578213.3333333335, ans=0.0 2023-11-28 16:18:22,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=3578213.3333333335, ans=0.95 2023-11-28 16:18:31,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3578280.0, ans=0.0 2023-11-28 16:18:32,268 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 536750 2023-11-28 16:18:37,478 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 7700, loss[loss=0.05423, simple_loss=0.07584, pruned_loss=0.005601, audio_tagging_loss=0.01071, over 15340.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08944, pruned_loss=0.01224, audio_tagging_loss=0.008424, over 3035321.69 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:18:40,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3578346.6666666665, ans=0.125 2023-11-28 16:18:47,800 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3578346.6666666665, ans=0.125 2023-11-28 16:18:57,462 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.127e+01 9.045e+01 9.681e+01 1.026e+02 1.409e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-28 16:19:08,893 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3578480.0, ans=0.2 2023-11-28 16:19:14,990 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.26 vs. limit=6.0 2023-11-28 16:19:31,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3578613.3333333335, ans=0.2 2023-11-28 16:19:33,949 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 536800 2023-11-28 16:19:34,532 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.46 vs. limit=15.0 2023-11-28 16:19:37,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3578613.3333333335, ans=0.125 2023-11-28 16:19:40,002 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 7750, loss[loss=0.08682, simple_loss=0.1253, pruned_loss=0.01644, audio_tagging_loss=0.007736, over 16086.00 frames. ], tot_loss[loss=0.06528, simple_loss=0.08907, pruned_loss=0.01221, audio_tagging_loss=0.008533, over 3039304.12 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:19:44,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3578680.0, ans=0.125 2023-11-28 16:20:02,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3578746.6666666665, ans=0.2 2023-11-28 16:20:05,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3578813.3333333335, ans=0.035 2023-11-28 16:20:08,753 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3578813.3333333335, ans=0.0 2023-11-28 16:20:21,981 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3578880.0, ans=0.125 2023-11-28 16:20:31,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3578946.6666666665, ans=0.0 2023-11-28 16:20:37,962 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 536850 2023-11-28 16:20:42,592 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 7800, loss[loss=0.07087, simple_loss=0.09503, pruned_loss=0.01437, audio_tagging_loss=0.008991, over 14473.00 frames. ], tot_loss[loss=0.06504, simple_loss=0.08875, pruned_loss=0.01202, audio_tagging_loss=0.008652, over 3042176.23 frames. ], batch size: 53, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:20:53,379 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3579080.0, ans=0.125 2023-11-28 16:20:57,485 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3579080.0, ans=0.0 2023-11-28 16:20:57,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3579080.0, ans=0.125 2023-11-28 16:20:58,782 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3579080.0, ans=0.125 2023-11-28 16:21:01,942 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.300e+01 9.025e+01 9.590e+01 1.021e+02 1.203e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-28 16:21:07,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3579146.6666666665, ans=0.125 2023-11-28 16:21:35,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3579280.0, ans=0.125 2023-11-28 16:21:38,664 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 536900 2023-11-28 16:21:43,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3579346.6666666665, ans=0.125 2023-11-28 16:21:43,992 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 7850, loss[loss=0.09786, simple_loss=0.1453, pruned_loss=0.01723, audio_tagging_loss=0.007997, over 15764.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.08949, pruned_loss=0.01228, audio_tagging_loss=0.00871, over 3037572.49 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:22:18,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3579480.0, ans=0.125 2023-11-28 16:22:31,641 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.37 vs. limit=22.5 2023-11-28 16:22:40,591 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 536950 2023-11-28 16:22:45,263 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 7900, loss[loss=0.06597, simple_loss=0.0885, pruned_loss=0.01436, audio_tagging_loss=0.007366, over 15528.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08928, pruned_loss=0.0122, audio_tagging_loss=0.008741, over 3037807.74 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:23:05,965 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.056e+01 9.028e+01 9.515e+01 1.023e+02 1.434e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-28 16:23:20,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3579813.3333333335, ans=0.1 2023-11-28 16:23:43,838 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 537000 2023-11-28 16:23:48,702 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 7950, loss[loss=0.06044, simple_loss=0.08104, pruned_loss=0.01147, audio_tagging_loss=0.008447, over 15987.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.08944, pruned_loss=0.01229, audio_tagging_loss=0.008863, over 3045218.29 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:23:54,929 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3580013.3333333335, ans=0.125 2023-11-28 16:24:07,213 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 16:24:09,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3580080.0, ans=0.125 2023-11-28 16:24:09,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3580080.0, ans=0.0 2023-11-28 16:24:20,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3580146.6666666665, ans=0.125 2023-11-28 16:24:31,665 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3580213.3333333335, ans=0.125 2023-11-28 16:24:45,604 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 537050 2023-11-28 16:24:50,106 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 8000, loss[loss=0.05852, simple_loss=0.08014, pruned_loss=0.008329, audio_tagging_loss=0.01012, over 16050.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.08929, pruned_loss=0.01216, audio_tagging_loss=0.008892, over 3043731.49 frames. ], batch size: 61, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:25:03,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3580413.3333333335, ans=0.1 2023-11-28 16:25:11,289 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.288e+01 8.789e+01 9.588e+01 1.026e+02 1.289e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-28 16:25:23,406 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3580480.0, ans=0.0 2023-11-28 16:25:23,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3580480.0, ans=0.125 2023-11-28 16:25:47,522 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 537100 2023-11-28 16:25:52,294 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 8050, loss[loss=0.07289, simple_loss=0.0982, pruned_loss=0.01247, audio_tagging_loss=0.01133, over 15291.00 frames. ], tot_loss[loss=0.06589, simple_loss=0.08947, pruned_loss=0.01221, audio_tagging_loss=0.008952, over 3049244.34 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:26:02,422 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3580680.0, ans=0.0 2023-11-28 16:26:49,006 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 537150 2023-11-28 16:26:54,665 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 8100, loss[loss=0.05063, simple_loss=0.06492, pruned_loss=0.007717, audio_tagging_loss=0.01046, over 14468.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.0896, pruned_loss=0.01217, audio_tagging_loss=0.008843, over 3038562.36 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 8.0 2023-11-28 16:27:18,054 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.231e+01 8.911e+01 9.512e+01 1.026e+02 1.565e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-28 16:27:54,014 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 537200 2023-11-28 16:27:58,912 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 8150, loss[loss=0.07047, simple_loss=0.1049, pruned_loss=0.01077, audio_tagging_loss=0.007231, over 15383.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.08945, pruned_loss=0.01205, audio_tagging_loss=0.008706, over 3047582.66 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 8.0 2023-11-28 16:28:37,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3581546.6666666665, ans=0.0 2023-11-28 16:28:56,593 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 537250 2023-11-28 16:28:58,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3581613.3333333335, ans=0.0 2023-11-28 16:29:01,124 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 8200, loss[loss=0.05925, simple_loss=0.07354, pruned_loss=0.01002, audio_tagging_loss=0.01246, over 14368.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08976, pruned_loss=0.01206, audio_tagging_loss=0.008602, over 3045990.57 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 8.0 2023-11-28 16:29:03,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3581680.0, ans=0.125 2023-11-28 16:29:05,710 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 16:29:23,392 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.462e+01 9.010e+01 9.552e+01 1.037e+02 1.390e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-28 16:29:38,424 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.59 vs. limit=22.5 2023-11-28 16:29:39,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3581880.0, ans=0.1 2023-11-28 16:29:40,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3581880.0, ans=0.07 2023-11-28 16:29:58,258 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 537300 2023-11-28 16:30:02,800 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 8250, loss[loss=0.05526, simple_loss=0.07064, pruned_loss=0.006533, audio_tagging_loss=0.01341, over 15657.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.09075, pruned_loss=0.01199, audio_tagging_loss=0.008532, over 3050696.36 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 8.0 2023-11-28 16:30:06,482 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.29 vs. limit=15.0 2023-11-28 16:30:13,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3582013.3333333335, ans=0.125 2023-11-28 16:30:14,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3582080.0, ans=0.125 2023-11-28 16:30:17,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3582080.0, ans=0.2 2023-11-28 16:30:31,453 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.93 vs. limit=15.0 2023-11-28 16:30:45,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3582213.3333333335, ans=0.0 2023-11-28 16:30:56,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3582280.0, ans=0.125 2023-11-28 16:31:00,911 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 537350 2023-11-28 16:31:06,270 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 8300, loss[loss=0.06001, simple_loss=0.07941, pruned_loss=0.01298, audio_tagging_loss=0.007327, over 16506.00 frames. ], tot_loss[loss=0.06602, simple_loss=0.09085, pruned_loss=0.01205, audio_tagging_loss=0.008542, over 3056852.93 frames. ], batch size: 61, lr: 1.50e-03, grad_scale: 8.0 2023-11-28 16:31:28,114 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.613e+01 8.841e+01 9.438e+01 1.013e+02 1.279e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-28 16:31:45,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3582546.6666666665, ans=0.0 2023-11-28 16:31:49,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3582546.6666666665, ans=0.125 2023-11-28 16:31:53,028 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3582546.6666666665, ans=0.1 2023-11-28 16:32:01,703 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3582613.3333333335, ans=0.125 2023-11-28 16:32:03,975 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 537400 2023-11-28 16:32:08,981 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 8350, loss[loss=0.07818, simple_loss=0.1164, pruned_loss=0.01332, audio_tagging_loss=0.006686, over 15337.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.09023, pruned_loss=0.01199, audio_tagging_loss=0.008456, over 3060258.37 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 8.0 2023-11-28 16:32:09,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3582680.0, ans=0.125 2023-11-28 16:32:13,432 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.23 vs. limit=15.0 2023-11-28 16:32:19,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3582680.0, ans=0.0 2023-11-28 16:32:20,427 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3582746.6666666665, ans=0.1 2023-11-28 16:32:55,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3582880.0, ans=0.0 2023-11-28 16:32:56,913 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3582880.0, ans=0.1 2023-11-28 16:33:03,422 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.59 vs. limit=15.0 2023-11-28 16:33:06,826 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 537450 2023-11-28 16:33:07,474 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.84 vs. limit=6.0 2023-11-28 16:33:11,443 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 8400, loss[loss=0.07725, simple_loss=0.1093, pruned_loss=0.01609, audio_tagging_loss=0.006503, over 16228.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.09034, pruned_loss=0.01213, audio_tagging_loss=0.008505, over 3065258.54 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:33:12,815 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3583013.3333333335, ans=0.015 2023-11-28 16:33:26,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3583080.0, ans=0.0 2023-11-28 16:33:34,447 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.574e+01 8.936e+01 9.656e+01 1.030e+02 3.353e+02, threshold=1.931e+02, percent-clipped=1.0 2023-11-28 16:33:58,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3583213.3333333335, ans=0.125 2023-11-28 16:34:07,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3583280.0, ans=0.125 2023-11-28 16:34:09,835 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 537500 2023-11-28 16:34:14,433 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 8450, loss[loss=0.05303, simple_loss=0.07121, pruned_loss=0.007481, audio_tagging_loss=0.009939, over 14335.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.0898, pruned_loss=0.01202, audio_tagging_loss=0.008566, over 3055450.42 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:34:24,774 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.15 vs. limit=15.0 2023-11-28 16:34:45,161 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 16:35:13,102 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 537550 2023-11-28 16:35:17,725 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 8500, loss[loss=0.06367, simple_loss=0.09391, pruned_loss=0.009183, audio_tagging_loss=0.007533, over 16155.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08937, pruned_loss=0.01206, audio_tagging_loss=0.008611, over 3049603.89 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:35:26,848 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.20 vs. limit=15.0 2023-11-28 16:35:36,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3583746.6666666665, ans=0.125 2023-11-28 16:35:37,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3583746.6666666665, ans=0.125 2023-11-28 16:35:39,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3583746.6666666665, ans=0.0 2023-11-28 16:35:40,035 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.468e+01 8.912e+01 9.575e+01 1.015e+02 1.303e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-28 16:35:55,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3583880.0, ans=0.125 2023-11-28 16:36:02,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3583880.0, ans=0.1 2023-11-28 16:36:14,185 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 537600 2023-11-28 16:36:14,296 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3583946.6666666665, ans=0.125 2023-11-28 16:36:16,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3583946.6666666665, ans=0.0 2023-11-28 16:36:19,732 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 8550, loss[loss=0.06926, simple_loss=0.1023, pruned_loss=0.01266, audio_tagging_loss=0.00545, over 15568.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.08982, pruned_loss=0.01221, audio_tagging_loss=0.008583, over 3057678.22 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:36:32,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3584080.0, ans=0.125 2023-11-28 16:36:47,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3584146.6666666665, ans=0.125 2023-11-28 16:37:00,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3584213.3333333335, ans=0.125 2023-11-28 16:37:10,688 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3584280.0, ans=0.05 2023-11-28 16:37:16,995 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 537650 2023-11-28 16:37:21,655 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 8600, loss[loss=0.07626, simple_loss=0.1021, pruned_loss=0.01582, audio_tagging_loss=0.009387, over 14972.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08905, pruned_loss=0.01209, audio_tagging_loss=0.008738, over 3052044.13 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:37:43,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3584413.3333333335, ans=0.0 2023-11-28 16:37:44,138 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.628e+01 8.790e+01 9.576e+01 1.011e+02 1.183e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-28 16:37:48,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3584480.0, ans=0.0 2023-11-28 16:37:51,122 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3584480.0, ans=0.1 2023-11-28 16:37:52,391 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3584480.0, ans=0.0 2023-11-28 16:38:11,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3584613.3333333335, ans=0.125 2023-11-28 16:38:18,185 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.89 vs. limit=15.0 2023-11-28 16:38:18,648 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 537700 2023-11-28 16:38:23,729 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 8650, loss[loss=0.06416, simple_loss=0.08126, pruned_loss=0.01378, audio_tagging_loss=0.009743, over 14785.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08976, pruned_loss=0.01215, audio_tagging_loss=0.008739, over 3052185.89 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:38:50,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3584813.3333333335, ans=0.0 2023-11-28 16:39:04,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3584880.0, ans=0.0 2023-11-28 16:39:11,548 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3584880.0, ans=0.125 2023-11-28 16:39:21,439 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 537750 2023-11-28 16:39:21,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3584946.6666666665, ans=0.0 2023-11-28 16:39:22,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3584946.6666666665, ans=0.0 2023-11-28 16:39:26,606 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 8700, loss[loss=0.04659, simple_loss=0.05444, pruned_loss=0.007149, audio_tagging_loss=0.01222, over 14442.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.08928, pruned_loss=0.01202, audio_tagging_loss=0.00882, over 3059338.83 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:39:48,553 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.457e+01 9.145e+01 9.850e+01 1.054e+02 1.476e+02, threshold=1.970e+02, percent-clipped=0.0 2023-11-28 16:40:14,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3585213.3333333335, ans=0.1 2023-11-28 16:40:16,929 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3585280.0, ans=0.04949747468305833 2023-11-28 16:40:19,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3585280.0, ans=0.2 2023-11-28 16:40:24,492 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 537800 2023-11-28 16:40:29,356 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 8750, loss[loss=0.08843, simple_loss=0.1165, pruned_loss=0.02173, audio_tagging_loss=0.00845, over 15825.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.08997, pruned_loss=0.01204, audio_tagging_loss=0.008884, over 3057562.73 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:40:36,730 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3585346.6666666665, ans=0.07 2023-11-28 16:40:47,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3585413.3333333335, ans=0.2 2023-11-28 16:40:51,912 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.28 vs. limit=22.5 2023-11-28 16:40:55,806 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3585480.0, ans=0.125 2023-11-28 16:40:57,083 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3585480.0, ans=0.125 2023-11-28 16:41:10,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3585546.6666666665, ans=0.0 2023-11-28 16:41:13,806 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3585546.6666666665, ans=0.1 2023-11-28 16:41:26,411 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 537850 2023-11-28 16:41:28,437 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.11 vs. limit=6.0 2023-11-28 16:41:31,147 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 8800, loss[loss=0.07024, simple_loss=0.09548, pruned_loss=0.0135, audio_tagging_loss=0.008996, over 14798.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.08964, pruned_loss=0.01193, audio_tagging_loss=0.008951, over 3056514.39 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:41:36,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3585680.0, ans=10.0 2023-11-28 16:41:46,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3585746.6666666665, ans=0.125 2023-11-28 16:41:51,577 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.15 vs. limit=15.0 2023-11-28 16:41:54,338 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.505e+01 9.010e+01 9.598e+01 1.030e+02 1.176e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-28 16:41:55,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3585813.3333333335, ans=0.0 2023-11-28 16:42:00,036 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3585813.3333333335, ans=0.125 2023-11-28 16:42:11,040 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.44 vs. limit=15.0 2023-11-28 16:42:13,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3585880.0, ans=0.125 2023-11-28 16:42:28,841 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 537900 2023-11-28 16:42:30,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3585946.6666666665, ans=0.125 2023-11-28 16:42:34,102 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 8850, loss[loss=0.08135, simple_loss=0.1179, pruned_loss=0.01739, audio_tagging_loss=0.005013, over 16088.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.08972, pruned_loss=0.01196, audio_tagging_loss=0.008903, over 3060024.25 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:42:48,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3586080.0, ans=0.125 2023-11-28 16:42:50,905 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 16:43:02,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3586146.6666666665, ans=0.125 2023-11-28 16:43:13,662 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.02 vs. limit=22.5 2023-11-28 16:43:26,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3586280.0, ans=0.1 2023-11-28 16:43:27,395 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.07 vs. limit=15.0 2023-11-28 16:43:28,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3586280.0, ans=0.1 2023-11-28 16:43:30,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3586280.0, ans=0.125 2023-11-28 16:43:31,499 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 537950 2023-11-28 16:43:36,688 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 8900, loss[loss=0.07195, simple_loss=0.1073, pruned_loss=0.01234, audio_tagging_loss=0.005953, over 16520.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.09016, pruned_loss=0.01198, audio_tagging_loss=0.00877, over 3057712.97 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:43:49,186 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3586413.3333333335, ans=0.125 2023-11-28 16:43:59,151 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.555e+01 9.009e+01 9.604e+01 1.041e+02 1.260e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-28 16:44:11,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3586480.0, ans=0.0 2023-11-28 16:44:23,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3586546.6666666665, ans=0.0 2023-11-28 16:44:33,955 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 538000 2023-11-28 16:44:36,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3586613.3333333335, ans=0.1 2023-11-28 16:44:39,017 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 8950, loss[loss=0.06444, simple_loss=0.08994, pruned_loss=0.01021, audio_tagging_loss=0.009261, over 15047.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.08955, pruned_loss=0.01182, audio_tagging_loss=0.00865, over 3054342.04 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:44:55,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3586746.6666666665, ans=0.0 2023-11-28 16:45:14,282 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3586813.3333333335, ans=0.1 2023-11-28 16:45:37,196 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 538050 2023-11-28 16:45:41,947 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 9000, loss[loss=0.06736, simple_loss=0.09514, pruned_loss=0.01303, audio_tagging_loss=0.006763, over 14537.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08994, pruned_loss=0.01197, audio_tagging_loss=0.008535, over 3053638.79 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:45:41,950 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-28 16:46:23,738 INFO [train_asr.py:1267] (1/4) Epoch 45, validation: loss=0.05837, simple_loss=0.05051, pruned_loss=0.005241, audio_tagging_loss=0.02788, over 4681554.00 frames. 2023-11-28 16:46:23,739 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-28 16:46:37,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3587080.0, ans=0.0 2023-11-28 16:46:39,185 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.64 vs. limit=12.0 2023-11-28 16:46:43,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3587080.0, ans=0.125 2023-11-28 16:46:46,757 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.194e+01 8.988e+01 9.549e+01 1.029e+02 1.340e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-28 16:46:51,853 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3587146.6666666665, ans=0.0 2023-11-28 16:47:01,010 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3587213.3333333335, ans=0.0 2023-11-28 16:47:08,703 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3587213.3333333335, ans=0.2 2023-11-28 16:47:20,953 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 538100 2023-11-28 16:47:26,286 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 9050, loss[loss=0.05446, simple_loss=0.06487, pruned_loss=0.0118, audio_tagging_loss=0.01023, over 16812.00 frames. ], tot_loss[loss=0.06507, simple_loss=0.08945, pruned_loss=0.01187, audio_tagging_loss=0.008477, over 3059480.05 frames. ], batch size: 65, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:47:30,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3587346.6666666665, ans=0.1 2023-11-28 16:47:39,931 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.64 vs. limit=15.0 2023-11-28 16:47:45,273 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.33 vs. limit=15.0 2023-11-28 16:47:53,362 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.41 vs. limit=15.0 2023-11-28 16:48:02,956 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 16:48:03,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3587546.6666666665, ans=0.0 2023-11-28 16:48:23,552 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 538150 2023-11-28 16:48:28,177 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 9100, loss[loss=0.07237, simple_loss=0.1093, pruned_loss=0.01186, audio_tagging_loss=0.00584, over 15373.00 frames. ], tot_loss[loss=0.06502, simple_loss=0.08932, pruned_loss=0.01189, audio_tagging_loss=0.008467, over 3055495.65 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:48:31,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3587680.0, ans=0.0 2023-11-28 16:48:37,259 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3587680.0, ans=0.2 2023-11-28 16:48:52,574 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.406e+01 8.776e+01 9.450e+01 1.014e+02 1.425e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-28 16:49:14,350 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.09 vs. limit=15.0 2023-11-28 16:49:26,108 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 538200 2023-11-28 16:49:30,989 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 9150, loss[loss=0.06613, simple_loss=0.07927, pruned_loss=0.017, audio_tagging_loss=0.009499, over 14665.00 frames. ], tot_loss[loss=0.065, simple_loss=0.08936, pruned_loss=0.01188, audio_tagging_loss=0.008434, over 3053105.78 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:49:31,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3588013.3333333335, ans=0.025 2023-11-28 16:49:33,580 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.15 vs. limit=15.0 2023-11-28 16:49:51,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3588080.0, ans=0.125 2023-11-28 16:49:56,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3588146.6666666665, ans=0.125 2023-11-28 16:50:04,755 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 16:50:21,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_na.min_abs, batch_count=3588280.0, ans=0.02 2023-11-28 16:50:28,724 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 538250 2023-11-28 16:50:30,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3588280.0, ans=0.125 2023-11-28 16:50:33,284 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 9200, loss[loss=0.08307, simple_loss=0.1144, pruned_loss=0.01755, audio_tagging_loss=0.008308, over 14818.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.08964, pruned_loss=0.01201, audio_tagging_loss=0.008506, over 3052946.44 frames. ], batch size: 53, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:50:45,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3588413.3333333335, ans=0.0 2023-11-28 16:50:56,597 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.755e+01 9.083e+01 9.538e+01 1.018e+02 1.192e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-28 16:51:29,996 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 538300 2023-11-28 16:51:35,090 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 9250, loss[loss=0.05534, simple_loss=0.07437, pruned_loss=0.01016, audio_tagging_loss=0.007993, over 15225.00 frames. ], tot_loss[loss=0.0647, simple_loss=0.08852, pruned_loss=0.01187, audio_tagging_loss=0.008569, over 3052720.43 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:51:35,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3588680.0, ans=0.1 2023-11-28 16:51:43,922 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.49 vs. limit=15.0 2023-11-28 16:52:23,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_na.min_abs, batch_count=3588880.0, ans=0.02 2023-11-28 16:52:26,952 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3588946.6666666665, ans=0.125 2023-11-28 16:52:34,296 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 538350 2023-11-28 16:52:39,073 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 9300, loss[loss=0.05269, simple_loss=0.07319, pruned_loss=0.00695, audio_tagging_loss=0.009149, over 16142.00 frames. ], tot_loss[loss=0.06493, simple_loss=0.0889, pruned_loss=0.01193, audio_tagging_loss=0.008543, over 3047695.27 frames. ], batch size: 63, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:53:03,655 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.801e+01 8.822e+01 9.388e+01 1.037e+02 1.623e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-28 16:53:08,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3589146.6666666665, ans=0.125 2023-11-28 16:53:13,782 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.17 vs. limit=15.0 2023-11-28 16:53:24,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3589213.3333333335, ans=0.125 2023-11-28 16:53:37,222 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 538400 2023-11-28 16:53:39,324 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3589280.0, ans=0.0 2023-11-28 16:53:42,857 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 9350, loss[loss=0.06308, simple_loss=0.08894, pruned_loss=0.01127, audio_tagging_loss=0.007342, over 15784.00 frames. ], tot_loss[loss=0.06484, simple_loss=0.08895, pruned_loss=0.01188, audio_tagging_loss=0.008488, over 3050450.45 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:53:44,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3589346.6666666665, ans=0.125 2023-11-28 16:53:50,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3589346.6666666665, ans=0.1 2023-11-28 16:53:51,278 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3589346.6666666665, ans=0.125 2023-11-28 16:53:52,297 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3589346.6666666665, ans=0.0 2023-11-28 16:53:56,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3589413.3333333335, ans=0.125 2023-11-28 16:53:59,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3589413.3333333335, ans=0.1 2023-11-28 16:54:34,752 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.30 vs. limit=15.0 2023-11-28 16:54:38,637 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.13 vs. limit=6.0 2023-11-28 16:54:40,078 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=3589613.3333333335, ans=15.0 2023-11-28 16:54:40,589 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 538450 2023-11-28 16:54:45,228 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 9400, loss[loss=0.09157, simple_loss=0.1365, pruned_loss=0.01689, audio_tagging_loss=0.006453, over 15505.00 frames. ], tot_loss[loss=0.06504, simple_loss=0.08908, pruned_loss=0.01187, audio_tagging_loss=0.008626, over 3048014.50 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 8.0 2023-11-28 16:54:46,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3589680.0, ans=0.1 2023-11-28 16:55:11,213 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.693e+01 8.781e+01 9.524e+01 1.025e+02 1.257e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-28 16:55:42,521 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 538500 2023-11-28 16:55:44,591 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.91 vs. limit=6.0 2023-11-28 16:55:47,172 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 9450, loss[loss=0.06095, simple_loss=0.07907, pruned_loss=0.01141, audio_tagging_loss=0.01001, over 15772.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.0895, pruned_loss=0.01193, audio_tagging_loss=0.008708, over 3043447.99 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 8.0 2023-11-28 16:55:49,525 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 16:56:26,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3590213.3333333335, ans=0.1 2023-11-28 16:56:26,711 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.41 vs. limit=15.0 2023-11-28 16:56:31,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3590213.3333333335, ans=0.125 2023-11-28 16:56:31,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3590213.3333333335, ans=0.0 2023-11-28 16:56:40,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3590280.0, ans=0.125 2023-11-28 16:56:45,138 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 538550 2023-11-28 16:56:49,800 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 9500, loss[loss=0.08022, simple_loss=0.1142, pruned_loss=0.01502, audio_tagging_loss=0.008093, over 15272.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08979, pruned_loss=0.01199, audio_tagging_loss=0.008783, over 3044121.26 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 8.0 2023-11-28 16:56:53,856 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.05 vs. limit=12.0 2023-11-28 16:57:16,074 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.050e+01 9.072e+01 9.708e+01 1.059e+02 2.012e+02, threshold=1.942e+02, percent-clipped=1.0 2023-11-28 16:57:26,915 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3590546.6666666665, ans=0.0 2023-11-28 16:57:29,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3590546.6666666665, ans=0.1 2023-11-28 16:57:37,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3590546.6666666665, ans=0.04949747468305833 2023-11-28 16:57:47,657 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 538600 2023-11-28 16:57:52,590 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 9550, loss[loss=0.06359, simple_loss=0.08232, pruned_loss=0.01244, audio_tagging_loss=0.009987, over 15427.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08942, pruned_loss=0.01188, audio_tagging_loss=0.008859, over 3049982.45 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 8.0 2023-11-28 16:57:55,482 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.85 vs. limit=15.0 2023-11-28 16:57:56,562 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3590680.0, ans=0.0 2023-11-28 16:58:06,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3590746.6666666665, ans=0.125 2023-11-28 16:58:12,758 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=3590746.6666666665, ans=15.0 2023-11-28 16:58:17,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3590813.3333333335, ans=0.125 2023-11-28 16:58:25,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3590813.3333333335, ans=0.1 2023-11-28 16:58:33,364 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-28 16:58:40,387 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3590880.0, ans=0.1 2023-11-28 16:58:50,197 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 538650 2023-11-28 16:58:51,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3590946.6666666665, ans=0.0 2023-11-28 16:58:55,028 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 9600, loss[loss=0.06184, simple_loss=0.09085, pruned_loss=0.009205, audio_tagging_loss=0.00721, over 15734.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08928, pruned_loss=0.01181, audio_tagging_loss=0.00888, over 3048047.53 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:58:55,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3591013.3333333335, ans=0.2 2023-11-28 16:58:57,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3591013.3333333335, ans=0.2 2023-11-28 16:59:01,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3591013.3333333335, ans=0.0 2023-11-28 16:59:21,720 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.897e+01 9.047e+01 9.589e+01 1.034e+02 1.302e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-28 16:59:32,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3591213.3333333335, ans=0.125 2023-11-28 16:59:36,947 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 16:59:49,237 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3591280.0, ans=0.125 2023-11-28 16:59:53,327 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 538700 2023-11-28 16:59:57,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3591346.6666666665, ans=0.0 2023-11-28 16:59:57,920 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 9650, loss[loss=0.08157, simple_loss=0.1185, pruned_loss=0.01506, audio_tagging_loss=0.007272, over 15368.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08948, pruned_loss=0.01181, audio_tagging_loss=0.008807, over 3037241.68 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 17:00:04,351 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3591346.6666666665, ans=0.2 2023-11-28 17:00:16,999 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.86 vs. limit=6.0 2023-11-28 17:00:26,007 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3591480.0, ans=0.125 2023-11-28 17:00:50,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3591613.3333333335, ans=0.125 2023-11-28 17:00:52,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3591613.3333333335, ans=0.0 2023-11-28 17:00:54,600 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 538750 2023-11-28 17:00:54,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3591613.3333333335, ans=0.07 2023-11-28 17:00:59,824 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 9700, loss[loss=0.06646, simple_loss=0.08902, pruned_loss=0.01043, audio_tagging_loss=0.01152, over 14799.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.08995, pruned_loss=0.0121, audio_tagging_loss=0.008667, over 3034198.58 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 17:01:00,772 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.43 vs. limit=15.0 2023-11-28 17:01:05,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3591680.0, ans=0.2 2023-11-28 17:01:26,471 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.891e+01 8.979e+01 9.434e+01 1.003e+02 1.570e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-28 17:01:28,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3591813.3333333335, ans=0.1 2023-11-28 17:01:45,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3591880.0, ans=0.125 2023-11-28 17:01:56,984 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.81 vs. limit=6.0 2023-11-28 17:01:57,473 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 538800 2023-11-28 17:01:57,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3591946.6666666665, ans=0.2 2023-11-28 17:02:02,760 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 9750, loss[loss=0.07959, simple_loss=0.119, pruned_loss=0.01302, audio_tagging_loss=0.007051, over 15248.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08949, pruned_loss=0.01213, audio_tagging_loss=0.008555, over 3034981.97 frames. ], batch size: 56, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:02:02,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3592013.3333333335, ans=0.0 2023-11-28 17:02:06,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3592013.3333333335, ans=0.0 2023-11-28 17:02:23,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3592080.0, ans=0.0 2023-11-28 17:02:29,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3592146.6666666665, ans=0.125 2023-11-28 17:02:40,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3592213.3333333335, ans=0.125 2023-11-28 17:02:53,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3592280.0, ans=0.125 2023-11-28 17:02:56,650 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.88 vs. limit=22.5 2023-11-28 17:02:57,806 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.14 vs. limit=6.0 2023-11-28 17:02:59,576 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 538850 2023-11-28 17:03:04,996 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 9800, loss[loss=0.0769, simple_loss=0.1039, pruned_loss=0.01405, audio_tagging_loss=0.01088, over 14968.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.09008, pruned_loss=0.01233, audio_tagging_loss=0.00848, over 3035728.35 frames. ], batch size: 57, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:03:07,585 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3592346.6666666665, ans=0.1 2023-11-28 17:03:07,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3592346.6666666665, ans=0.125 2023-11-28 17:03:08,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3592346.6666666665, ans=0.2 2023-11-28 17:03:19,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3592413.3333333335, ans=0.125 2023-11-28 17:03:20,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3592413.3333333335, ans=0.0 2023-11-28 17:03:25,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3592413.3333333335, ans=0.0 2023-11-28 17:03:31,253 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.538e+01 8.992e+01 9.640e+01 1.016e+02 2.169e+02, threshold=1.928e+02, percent-clipped=1.0 2023-11-28 17:03:42,241 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.93 vs. limit=15.0 2023-11-28 17:03:55,049 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3592613.3333333335, ans=0.1 2023-11-28 17:03:55,288 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 17:03:57,846 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten.whitening_limit, batch_count=3592613.3333333335, ans=15.0 2023-11-28 17:04:02,234 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 538900 2023-11-28 17:04:04,666 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 17:04:07,528 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 9850, loss[loss=0.07351, simple_loss=0.1021, pruned_loss=0.01273, audio_tagging_loss=0.009702, over 15148.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.08993, pruned_loss=0.0123, audio_tagging_loss=0.008464, over 3036715.42 frames. ], batch size: 55, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:04:23,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3592746.6666666665, ans=0.2 2023-11-28 17:04:32,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3592813.3333333335, ans=0.125 2023-11-28 17:05:05,082 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 538950 2023-11-28 17:05:10,985 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 9900, loss[loss=0.08079, simple_loss=0.1096, pruned_loss=0.01707, audio_tagging_loss=0.008929, over 14964.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.0912, pruned_loss=0.01266, audio_tagging_loss=0.008386, over 3046313.00 frames. ], batch size: 55, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:05:26,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3593080.0, ans=0.0 2023-11-28 17:05:27,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3593080.0, ans=0.1 2023-11-28 17:05:28,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3593080.0, ans=0.0 2023-11-28 17:05:30,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3593080.0, ans=0.125 2023-11-28 17:05:36,811 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.649e+01 9.160e+01 9.894e+01 1.082e+02 1.663e+02, threshold=1.979e+02, percent-clipped=0.0 2023-11-28 17:05:52,059 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 17:05:52,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3593213.3333333335, ans=0.125 2023-11-28 17:05:53,584 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.57 vs. limit=6.0 2023-11-28 17:06:08,655 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 539000 2023-11-28 17:06:14,252 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 9950, loss[loss=0.08252, simple_loss=0.1172, pruned_loss=0.01613, audio_tagging_loss=0.007791, over 14880.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.0913, pruned_loss=0.01278, audio_tagging_loss=0.008429, over 3045149.70 frames. ], batch size: 53, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:06:18,392 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.77 vs. limit=6.0 2023-11-28 17:06:28,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3593413.3333333335, ans=0.125 2023-11-28 17:06:58,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3593546.6666666665, ans=0.125 2023-11-28 17:07:10,837 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 539050 2023-11-28 17:07:15,476 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 10000, loss[loss=0.06808, simple_loss=0.09756, pruned_loss=0.01232, audio_tagging_loss=0.006986, over 14116.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.09031, pruned_loss=0.01258, audio_tagging_loss=0.008436, over 3043837.28 frames. ], batch size: 55, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:07:18,863 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3593680.0, ans=0.0 2023-11-28 17:07:22,940 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.16 vs. limit=8.0 2023-11-28 17:07:38,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3593746.6666666665, ans=0.0 2023-11-28 17:07:42,051 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.583e+01 8.648e+01 9.149e+01 9.983e+01 1.212e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-28 17:07:51,719 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.19 vs. limit=15.0 2023-11-28 17:07:52,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3593880.0, ans=0.1 2023-11-28 17:08:13,326 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 539100 2023-11-28 17:08:18,077 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 10050, loss[loss=0.06695, simple_loss=0.09306, pruned_loss=0.01239, audio_tagging_loss=0.008034, over 15780.00 frames. ], tot_loss[loss=0.06648, simple_loss=0.09084, pruned_loss=0.01258, audio_tagging_loss=0.008478, over 3041192.21 frames. ], batch size: 58, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:08:26,097 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.91 vs. limit=15.0 2023-11-28 17:08:27,887 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-28 17:08:43,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3594146.6666666665, ans=0.0 2023-11-28 17:08:56,365 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3594213.3333333335, ans=0.0 2023-11-28 17:09:16,315 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 539150 2023-11-28 17:09:21,029 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 10100, loss[loss=0.07109, simple_loss=0.08436, pruned_loss=0.01761, audio_tagging_loss=0.0113, over 15501.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.09068, pruned_loss=0.01244, audio_tagging_loss=0.008442, over 3040771.00 frames. ], batch size: 58, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:09:22,858 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.81 vs. limit=10.0 2023-11-28 17:09:27,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3594346.6666666665, ans=0.0 2023-11-28 17:09:43,049 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3594413.3333333335, ans=0.125 2023-11-28 17:09:48,555 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.194e+01 8.910e+01 9.610e+01 1.020e+02 1.223e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-28 17:10:00,585 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.00 vs. limit=15.0 2023-11-28 17:10:16,243 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 17:10:18,863 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 539200 2023-11-28 17:10:20,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3594613.3333333335, ans=0.125 2023-11-28 17:10:23,936 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 10150, loss[loss=0.06847, simple_loss=0.09604, pruned_loss=0.01143, audio_tagging_loss=0.009025, over 16276.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.09114, pruned_loss=0.0124, audio_tagging_loss=0.008528, over 3042984.53 frames. ], batch size: 61, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:10:35,614 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=3594746.6666666665, ans=15.0 2023-11-28 17:10:46,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3594746.6666666665, ans=0.2 2023-11-28 17:10:50,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3594813.3333333335, ans=0.2 2023-11-28 17:10:55,951 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.38 vs. limit=15.0 2023-11-28 17:10:57,627 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 17:11:07,172 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3594880.0, ans=0.0 2023-11-28 17:11:17,431 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.31 vs. limit=22.5 2023-11-28 17:11:21,527 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 539250 2023-11-28 17:11:25,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3595013.3333333335, ans=0.125 2023-11-28 17:11:26,362 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 10200, loss[loss=0.07634, simple_loss=0.103, pruned_loss=0.0131, audio_tagging_loss=0.01174, over 14043.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.09064, pruned_loss=0.01235, audio_tagging_loss=0.008681, over 3044661.75 frames. ], batch size: 53, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:11:30,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3595013.3333333335, ans=0.0 2023-11-28 17:11:32,114 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3595013.3333333335, ans=0.125 2023-11-28 17:11:37,240 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.00 vs. limit=15.0 2023-11-28 17:11:46,045 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.12 vs. limit=15.0 2023-11-28 17:11:47,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3595080.0, ans=0.2 2023-11-28 17:11:52,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3595146.6666666665, ans=0.0 2023-11-28 17:11:54,330 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.516e+01 8.974e+01 9.604e+01 1.042e+02 1.393e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-28 17:11:54,386 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 17:12:24,080 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 539300 2023-11-28 17:12:25,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3595280.0, ans=0.2 2023-11-28 17:12:26,814 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3595280.0, ans=0.0 2023-11-28 17:12:28,250 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.52 vs. limit=22.5 2023-11-28 17:12:28,740 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 10250, loss[loss=0.0615, simple_loss=0.0776, pruned_loss=0.01226, audio_tagging_loss=0.01044, over 14969.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.0904, pruned_loss=0.0123, audio_tagging_loss=0.008772, over 3044005.55 frames. ], batch size: 57, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:12:41,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3595413.3333333335, ans=0.1 2023-11-28 17:12:43,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3595413.3333333335, ans=0.125 2023-11-28 17:12:44,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3595413.3333333335, ans=0.2 2023-11-28 17:12:44,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3595413.3333333335, ans=0.0 2023-11-28 17:12:59,872 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.65 vs. limit=22.5 2023-11-28 17:13:03,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3595480.0, ans=0.1 2023-11-28 17:13:12,349 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.36 vs. limit=15.0 2023-11-28 17:13:27,231 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 539350 2023-11-28 17:13:28,708 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3595613.3333333335, ans=0.09899494936611666 2023-11-28 17:13:30,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3595680.0, ans=0.125 2023-11-28 17:13:31,875 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 10300, loss[loss=0.06059, simple_loss=0.07625, pruned_loss=0.01218, audio_tagging_loss=0.01029, over 14780.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08874, pruned_loss=0.01212, audio_tagging_loss=0.008863, over 3044054.23 frames. ], batch size: 56, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:13:32,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3595680.0, ans=0.2 2023-11-28 17:13:36,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3595680.0, ans=0.125 2023-11-28 17:13:44,823 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.29 vs. limit=15.0 2023-11-28 17:13:50,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3595746.6666666665, ans=0.0 2023-11-28 17:13:53,308 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.10 vs. limit=15.0 2023-11-28 17:13:59,039 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.726e+01 9.050e+01 9.766e+01 1.043e+02 1.224e+02, threshold=1.953e+02, percent-clipped=0.0 2023-11-28 17:14:14,782 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 17:14:24,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff2.min_abs, batch_count=3595946.6666666665, ans=0.1 2023-11-28 17:14:29,311 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 539400 2023-11-28 17:14:34,246 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 10350, loss[loss=0.06122, simple_loss=0.0834, pruned_loss=0.01002, audio_tagging_loss=0.0095, over 15704.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08854, pruned_loss=0.01211, audio_tagging_loss=0.008967, over 3049459.01 frames. ], batch size: 59, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:14:45,720 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3596080.0, ans=0.125 2023-11-28 17:14:50,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3596080.0, ans=0.2 2023-11-28 17:15:07,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3596146.6666666665, ans=0.125 2023-11-28 17:15:17,047 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3596213.3333333335, ans=0.125 2023-11-28 17:15:23,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3596280.0, ans=0.2 2023-11-28 17:15:24,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3596280.0, ans=0.1 2023-11-28 17:15:30,074 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 539450 2023-11-28 17:15:31,736 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.22 vs. limit=15.0 2023-11-28 17:15:34,662 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 10400, loss[loss=0.08392, simple_loss=0.1193, pruned_loss=0.01817, audio_tagging_loss=0.006104, over 15715.00 frames. ], tot_loss[loss=0.066, simple_loss=0.0896, pruned_loss=0.01217, audio_tagging_loss=0.009026, over 3044209.18 frames. ], batch size: 55, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:16:01,195 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.937e+01 9.027e+01 9.708e+01 1.021e+02 1.407e+02, threshold=1.942e+02, percent-clipped=0.0 2023-11-28 17:16:32,326 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 539500 2023-11-28 17:16:36,762 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 10450, loss[loss=0.05444, simple_loss=0.07333, pruned_loss=0.01071, audio_tagging_loss=0.007072, over 15198.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.08905, pruned_loss=0.01212, audio_tagging_loss=0.008922, over 3043806.87 frames. ], batch size: 58, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:16:38,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3596680.0, ans=0.2 2023-11-28 17:16:38,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3596680.0, ans=0.1 2023-11-28 17:16:55,197 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=5.28 vs. limit=15.0 2023-11-28 17:17:17,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3596880.0, ans=0.2 2023-11-28 17:17:23,609 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3596880.0, ans=0.0 2023-11-28 17:17:33,747 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 539550 2023-11-28 17:17:34,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3596946.6666666665, ans=0.1 2023-11-28 17:17:38,929 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 10500, loss[loss=0.07793, simple_loss=0.1147, pruned_loss=0.01416, audio_tagging_loss=0.006423, over 15855.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.0891, pruned_loss=0.01215, audio_tagging_loss=0.008753, over 3048655.77 frames. ], batch size: 61, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:18:03,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3597146.6666666665, ans=0.5 2023-11-28 17:18:05,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3597146.6666666665, ans=0.125 2023-11-28 17:18:06,787 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.514e+01 8.931e+01 9.605e+01 1.033e+02 2.073e+02, threshold=1.921e+02, percent-clipped=1.0 2023-11-28 17:18:35,980 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 539600 2023-11-28 17:18:36,104 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3597280.0, ans=0.125 2023-11-28 17:18:40,854 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 10550, loss[loss=0.05387, simple_loss=0.06789, pruned_loss=0.009794, audio_tagging_loss=0.01014, over 15438.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08917, pruned_loss=0.01204, audio_tagging_loss=0.008616, over 3044020.40 frames. ], batch size: 60, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:18:57,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3597413.3333333335, ans=0.125 2023-11-28 17:18:57,898 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3597413.3333333335, ans=0.0 2023-11-28 17:19:22,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3597546.6666666665, ans=0.125 2023-11-28 17:19:37,860 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 539650 2023-11-28 17:19:41,732 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3597680.0, ans=0.1 2023-11-28 17:19:42,589 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 10600, loss[loss=0.09627, simple_loss=0.1319, pruned_loss=0.02533, audio_tagging_loss=0.004994, over 14741.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.08908, pruned_loss=0.01203, audio_tagging_loss=0.008593, over 3040483.73 frames. ], batch size: 54, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:20:11,853 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.139e+01 8.954e+01 9.589e+01 1.025e+02 1.251e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-28 17:20:13,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3597813.3333333335, ans=0.125 2023-11-28 17:20:14,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3597813.3333333335, ans=0.125 2023-11-28 17:20:35,523 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3597946.6666666665, ans=0.1 2023-11-28 17:20:40,087 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 539700 2023-11-28 17:20:45,336 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 10650, loss[loss=0.05995, simple_loss=0.07452, pruned_loss=0.01223, audio_tagging_loss=0.01046, over 14450.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08965, pruned_loss=0.01214, audio_tagging_loss=0.008509, over 3039523.80 frames. ], batch size: 56, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:20:49,062 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 17:20:52,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3598013.3333333335, ans=0.125 2023-11-28 17:21:31,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3598213.3333333335, ans=0.2 2023-11-28 17:21:41,413 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.59 vs. limit=15.0 2023-11-28 17:21:41,976 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 539750 2023-11-28 17:21:47,286 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 10700, loss[loss=0.09756, simple_loss=0.1383, pruned_loss=0.02271, audio_tagging_loss=0.005682, over 15639.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.09016, pruned_loss=0.01212, audio_tagging_loss=0.008468, over 3043910.69 frames. ], batch size: 58, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:21:54,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=3598346.6666666665, ans=15.0 2023-11-28 17:22:08,226 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3598413.3333333335, ans=0.2 2023-11-28 17:22:12,081 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.65 vs. limit=15.0 2023-11-28 17:22:15,494 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.694e+01 8.739e+01 9.237e+01 1.012e+02 1.304e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-28 17:22:43,833 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 539800 2023-11-28 17:22:48,110 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3598680.0, ans=0.1 2023-11-28 17:22:48,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3598680.0, ans=0.125 2023-11-28 17:22:49,066 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 10750, loss[loss=0.07484, simple_loss=0.11, pruned_loss=0.01372, audio_tagging_loss=0.006139, over 16500.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.09009, pruned_loss=0.01202, audio_tagging_loss=0.008494, over 3049135.51 frames. ], batch size: 60, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:22:52,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3598680.0, ans=0.0 2023-11-28 17:22:58,146 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.12 vs. limit=15.0 2023-11-28 17:23:07,664 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=3598746.6666666665, ans=22.5 2023-11-28 17:23:32,727 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 17:23:45,909 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 539850 2023-11-28 17:23:51,164 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 10800, loss[loss=0.07668, simple_loss=0.1031, pruned_loss=0.01826, audio_tagging_loss=0.006854, over 15415.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.08941, pruned_loss=0.01195, audio_tagging_loss=0.00853, over 3049934.74 frames. ], batch size: 55, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:23:52,492 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.69 vs. limit=15.0 2023-11-28 17:24:01,520 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3599013.3333333335, ans=0.1 2023-11-28 17:24:04,296 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.06 vs. limit=12.0 2023-11-28 17:24:19,466 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.712e+01 8.985e+01 9.429e+01 1.046e+02 1.643e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-28 17:24:31,282 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.58 vs. limit=15.0 2023-11-28 17:24:33,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3599213.3333333335, ans=0.125 2023-11-28 17:24:48,020 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 539900 2023-11-28 17:24:51,713 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 17:24:53,264 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 10850, loss[loss=0.0712, simple_loss=0.09694, pruned_loss=0.01448, audio_tagging_loss=0.008251, over 15646.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.0907, pruned_loss=0.01217, audio_tagging_loss=0.008488, over 3051810.20 frames. ], batch size: 57, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:24:58,204 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3599346.6666666665, ans=0.2 2023-11-28 17:24:58,427 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.whiten.whitening_limit, batch_count=3599346.6666666665, ans=12.0 2023-11-28 17:25:12,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3599413.3333333335, ans=0.0 2023-11-28 17:25:44,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3599613.3333333335, ans=0.0 2023-11-28 17:25:49,847 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 539950 2023-11-28 17:25:54,403 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 10900, loss[loss=0.05537, simple_loss=0.07624, pruned_loss=0.007407, audio_tagging_loss=0.009839, over 15909.00 frames. ], tot_loss[loss=0.06596, simple_loss=0.09048, pruned_loss=0.0122, audio_tagging_loss=0.008524, over 3058975.73 frames. ], batch size: 59, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:25:54,459 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 17:25:54,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3599680.0, ans=0.125 2023-11-28 17:25:59,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3599680.0, ans=0.125 2023-11-28 17:26:18,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3599813.3333333335, ans=0.125 2023-11-28 17:26:23,250 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.592e+01 8.863e+01 9.525e+01 1.011e+02 1.256e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-28 17:26:23,518 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3599813.3333333335, ans=0.125 2023-11-28 17:26:31,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3599880.0, ans=0.125 2023-11-28 17:26:36,225 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.64 vs. limit=15.0 2023-11-28 17:26:51,472 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 540000 2023-11-28 17:26:58,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3600013.3333333335, ans=0.0 2023-11-28 17:26:58,870 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 10950, loss[loss=0.05453, simple_loss=0.07244, pruned_loss=0.007466, audio_tagging_loss=0.01084, over 16111.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08994, pruned_loss=0.01207, audio_tagging_loss=0.008556, over 3050353.37 frames. ], batch size: 63, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:27:15,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3600080.0, ans=0.125 2023-11-28 17:27:24,840 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.61 vs. limit=15.0 2023-11-28 17:27:29,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3600146.6666666665, ans=0.125 2023-11-28 17:27:30,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3600146.6666666665, ans=0.0 2023-11-28 17:27:55,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3600280.0, ans=0.125 2023-11-28 17:27:56,603 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 540050 2023-11-28 17:28:01,202 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 11000, loss[loss=0.06343, simple_loss=0.08611, pruned_loss=0.009752, audio_tagging_loss=0.01062, over 14421.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.08955, pruned_loss=0.0119, audio_tagging_loss=0.00865, over 3039984.55 frames. ], batch size: 55, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:28:01,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3600346.6666666665, ans=0.1 2023-11-28 17:28:05,498 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3600346.6666666665, ans=0.125 2023-11-28 17:28:14,083 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.32 vs. limit=15.0 2023-11-28 17:28:15,680 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 17:28:25,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3600480.0, ans=0.2 2023-11-28 17:28:29,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3600480.0, ans=0.09899494936611666 2023-11-28 17:28:30,176 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.650e+01 8.952e+01 9.649e+01 1.058e+02 1.351e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-28 17:28:30,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3600480.0, ans=0.125 2023-11-28 17:28:40,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3600546.6666666665, ans=0.125 2023-11-28 17:28:42,361 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3600546.6666666665, ans=0.025 2023-11-28 17:28:43,592 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3600546.6666666665, ans=0.125 2023-11-28 17:28:54,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3600613.3333333335, ans=0.1 2023-11-28 17:28:58,190 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 540100 2023-11-28 17:29:02,660 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 11050, loss[loss=0.06704, simple_loss=0.09046, pruned_loss=0.01298, audio_tagging_loss=0.008829, over 14718.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08952, pruned_loss=0.01198, audio_tagging_loss=0.00875, over 3033483.43 frames. ], batch size: 57, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:29:36,614 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3600813.3333333335, ans=0.125 2023-11-28 17:29:57,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3600946.6666666665, ans=0.2 2023-11-28 17:29:57,556 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3600946.6666666665, ans=0.0 2023-11-28 17:29:59,784 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 540150 2023-11-28 17:30:04,339 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 11100, loss[loss=0.05273, simple_loss=0.07103, pruned_loss=0.009304, audio_tagging_loss=0.007914, over 15137.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08926, pruned_loss=0.01203, audio_tagging_loss=0.008832, over 3041668.16 frames. ], batch size: 57, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:30:06,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3601013.3333333335, ans=0.125 2023-11-28 17:30:17,103 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 17:30:34,433 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.951e+01 8.960e+01 9.547e+01 1.044e+02 1.303e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-28 17:30:59,386 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3601280.0, ans=0.2 2023-11-28 17:31:01,497 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 540200 2023-11-28 17:31:06,508 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 11150, loss[loss=0.05747, simple_loss=0.07993, pruned_loss=0.006645, audio_tagging_loss=0.01086, over 13425.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.08964, pruned_loss=0.01208, audio_tagging_loss=0.008842, over 3044993.51 frames. ], batch size: 53, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:31:20,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3601413.3333333335, ans=0.2 2023-11-28 17:31:22,287 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.95 vs. limit=12.0 2023-11-28 17:31:33,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3601480.0, ans=0.125 2023-11-28 17:31:40,499 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.81 vs. limit=15.0 2023-11-28 17:31:53,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3601546.6666666665, ans=0.1 2023-11-28 17:32:04,090 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 540250 2023-11-28 17:32:08,648 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 11200, loss[loss=0.05232, simple_loss=0.07013, pruned_loss=0.005883, audio_tagging_loss=0.01137, over 16490.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.08963, pruned_loss=0.01211, audio_tagging_loss=0.008865, over 3040030.13 frames. ], batch size: 66, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:32:38,183 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.490e+01 8.987e+01 9.522e+01 1.045e+02 1.448e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-28 17:32:54,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3601880.0, ans=0.125 2023-11-28 17:33:05,166 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 540300 2023-11-28 17:33:09,786 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 11250, loss[loss=0.07441, simple_loss=0.09946, pruned_loss=0.01653, audio_tagging_loss=0.00815, over 14936.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.08939, pruned_loss=0.01224, audio_tagging_loss=0.008895, over 3040016.93 frames. ], batch size: 54, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:33:14,832 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3602013.3333333335, ans=0.0 2023-11-28 17:33:27,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3602080.0, ans=0.125 2023-11-28 17:33:28,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3602080.0, ans=0.2 2023-11-28 17:33:40,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3602146.6666666665, ans=0.1 2023-11-28 17:34:07,218 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 540350 2023-11-28 17:34:07,457 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 17:34:11,700 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 11300, loss[loss=0.05625, simple_loss=0.07503, pruned_loss=0.007057, audio_tagging_loss=0.01168, over 15912.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.08913, pruned_loss=0.01224, audio_tagging_loss=0.00882, over 3047163.74 frames. ], batch size: 60, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:34:12,161 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.16 vs. limit=15.0 2023-11-28 17:34:19,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3602346.6666666665, ans=0.125 2023-11-28 17:34:25,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3602413.3333333335, ans=0.015 2023-11-28 17:34:38,658 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.73 vs. limit=15.0 2023-11-28 17:34:41,450 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.193e+01 8.983e+01 9.542e+01 1.053e+02 1.409e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-28 17:34:45,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3602480.0, ans=0.2 2023-11-28 17:34:45,246 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3602480.0, ans=0.0 2023-11-28 17:35:05,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3602613.3333333335, ans=0.125 2023-11-28 17:35:06,488 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff3.min_abs, batch_count=3602613.3333333335, ans=0.2 2023-11-28 17:35:08,578 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 540400 2023-11-28 17:35:08,654 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3602613.3333333335, ans=0.0 2023-11-28 17:35:08,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3602613.3333333335, ans=0.1 2023-11-28 17:35:14,240 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 11350, loss[loss=0.05112, simple_loss=0.06516, pruned_loss=0.009095, audio_tagging_loss=0.009448, over 14827.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.08909, pruned_loss=0.0121, audio_tagging_loss=0.008756, over 3038089.08 frames. ], batch size: 59, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:35:45,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3602813.3333333335, ans=0.025 2023-11-28 17:35:49,719 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3602880.0, ans=0.125 2023-11-28 17:35:53,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3602880.0, ans=0.0 2023-11-28 17:36:07,622 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.87 vs. limit=15.0 2023-11-28 17:36:07,636 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.54 vs. limit=12.0 2023-11-28 17:36:11,259 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 540450 2023-11-28 17:36:14,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3603013.3333333335, ans=0.1 2023-11-28 17:36:15,781 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 11400, loss[loss=0.06431, simple_loss=0.08603, pruned_loss=0.0122, audio_tagging_loss=0.009089, over 14955.00 frames. ], tot_loss[loss=0.0651, simple_loss=0.08869, pruned_loss=0.01214, audio_tagging_loss=0.008617, over 3033948.21 frames. ], batch size: 56, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:36:19,890 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.94 vs. limit=10.0 2023-11-28 17:36:47,214 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.957e+01 9.040e+01 9.661e+01 1.043e+02 1.391e+02, threshold=1.932e+02, percent-clipped=0.0 2023-11-28 17:37:02,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3603213.3333333335, ans=0.0 2023-11-28 17:37:05,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3603280.0, ans=0.0 2023-11-28 17:37:08,498 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.40 vs. limit=22.5 2023-11-28 17:37:12,705 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 540500 2023-11-28 17:37:18,036 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 11450, loss[loss=0.07733, simple_loss=0.1131, pruned_loss=0.01548, audio_tagging_loss=0.005301, over 14488.00 frames. ], tot_loss[loss=0.06538, simple_loss=0.08917, pruned_loss=0.01221, audio_tagging_loss=0.008588, over 3038432.05 frames. ], batch size: 55, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:37:37,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3603413.3333333335, ans=0.125 2023-11-28 17:37:55,399 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3603546.6666666665, ans=0.2 2023-11-28 17:37:55,622 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.65 vs. limit=15.0 2023-11-28 17:38:15,227 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 540550 2023-11-28 17:38:19,860 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 11500, loss[loss=0.04054, simple_loss=0.04814, pruned_loss=0.007161, audio_tagging_loss=0.009311, over 14994.00 frames. ], tot_loss[loss=0.06472, simple_loss=0.08813, pruned_loss=0.01202, audio_tagging_loss=0.008635, over 3039767.22 frames. ], batch size: 59, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:38:25,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3603680.0, ans=0.2 2023-11-28 17:38:30,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3603680.0, ans=0.0 2023-11-28 17:38:32,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3603746.6666666665, ans=0.035 2023-11-28 17:38:50,109 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.56 vs. limit=15.0 2023-11-28 17:38:50,465 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.613e+01 8.641e+01 9.267e+01 1.033e+02 1.350e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-28 17:39:02,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3603880.0, ans=0.2 2023-11-28 17:39:17,491 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 540600 2023-11-28 17:39:22,436 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 11550, loss[loss=0.07075, simple_loss=0.1019, pruned_loss=0.01432, audio_tagging_loss=0.005486, over 15293.00 frames. ], tot_loss[loss=0.06488, simple_loss=0.08815, pruned_loss=0.01213, audio_tagging_loss=0.008667, over 3037750.90 frames. ], batch size: 58, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:39:25,554 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=3604013.3333333335, ans=15.0 2023-11-28 17:39:28,936 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.21 vs. limit=22.5 2023-11-28 17:39:41,113 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.50 vs. limit=15.0 2023-11-28 17:40:02,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3604213.3333333335, ans=0.2 2023-11-28 17:40:05,157 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 17:40:10,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3604213.3333333335, ans=0.125 2023-11-28 17:40:18,798 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 540650 2023-11-28 17:40:23,074 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 11600, loss[loss=0.07229, simple_loss=0.09877, pruned_loss=0.0135, audio_tagging_loss=0.009401, over 15019.00 frames. ], tot_loss[loss=0.0651, simple_loss=0.0886, pruned_loss=0.01216, audio_tagging_loss=0.008637, over 3042761.19 frames. ], batch size: 55, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:40:39,205 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.44 vs. limit=12.0 2023-11-28 17:40:44,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3604413.3333333335, ans=0.125 2023-11-28 17:40:55,426 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.723e+01 8.847e+01 9.637e+01 1.017e+02 1.289e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-28 17:41:11,193 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3604546.6666666665, ans=0.0 2023-11-28 17:41:21,322 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 540700 2023-11-28 17:41:25,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3604680.0, ans=0.1 2023-11-28 17:41:26,728 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 11650, loss[loss=0.05336, simple_loss=0.067, pruned_loss=0.008954, audio_tagging_loss=0.0109, over 16369.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.09011, pruned_loss=0.01237, audio_tagging_loss=0.008554, over 3038766.62 frames. ], batch size: 63, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:41:31,800 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.32 vs. limit=12.0 2023-11-28 17:41:32,701 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3604680.0, ans=0.125 2023-11-28 17:42:10,981 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3604880.0, ans=0.0 2023-11-28 17:42:14,434 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 17:42:22,904 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 540750 2023-11-28 17:42:28,525 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 11700, loss[loss=0.0478, simple_loss=0.06294, pruned_loss=0.006463, audio_tagging_loss=0.009867, over 15491.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08971, pruned_loss=0.01225, audio_tagging_loss=0.008584, over 3043502.97 frames. ], batch size: 60, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:42:29,828 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3605013.3333333335, ans=0.0 2023-11-28 17:42:35,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3605013.3333333335, ans=0.0 2023-11-28 17:42:52,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3605146.6666666665, ans=0.0 2023-11-28 17:42:55,254 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3605146.6666666665, ans=0.125 2023-11-28 17:42:58,991 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.776e+01 9.057e+01 9.679e+01 1.035e+02 1.386e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-28 17:43:02,022 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.92 vs. limit=15.0 2023-11-28 17:43:08,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3605213.3333333335, ans=0.1 2023-11-28 17:43:19,427 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.77 vs. limit=15.0 2023-11-28 17:43:24,706 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 540800 2023-11-28 17:43:29,711 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 11750, loss[loss=0.0845, simple_loss=0.1214, pruned_loss=0.01857, audio_tagging_loss=0.005234, over 15703.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.0894, pruned_loss=0.01236, audio_tagging_loss=0.008597, over 3037766.76 frames. ], batch size: 59, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:43:41,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3605413.3333333335, ans=0.125 2023-11-28 17:43:47,760 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3605413.3333333335, ans=0.05 2023-11-28 17:44:09,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3605546.6666666665, ans=0.1 2023-11-28 17:44:10,490 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.42 vs. limit=15.0 2023-11-28 17:44:25,452 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.69 vs. limit=15.0 2023-11-28 17:44:27,165 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 540850 2023-11-28 17:44:32,153 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 11800, loss[loss=0.06601, simple_loss=0.08903, pruned_loss=0.01151, audio_tagging_loss=0.00999, over 16161.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08874, pruned_loss=0.01221, audio_tagging_loss=0.008694, over 3044780.95 frames. ], batch size: 60, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:44:50,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3605746.6666666665, ans=0.125 2023-11-28 17:45:01,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3605813.3333333335, ans=0.125 2023-11-28 17:45:02,566 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.799e+01 8.699e+01 9.349e+01 9.967e+01 1.294e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-28 17:45:14,719 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.13 vs. limit=22.5 2023-11-28 17:45:28,756 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 540900 2023-11-28 17:45:28,900 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3605946.6666666665, ans=0.125 2023-11-28 17:45:33,920 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 11850, loss[loss=0.06346, simple_loss=0.08989, pruned_loss=0.01015, audio_tagging_loss=0.008365, over 16525.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.08854, pruned_loss=0.01212, audio_tagging_loss=0.008737, over 3039840.18 frames. ], batch size: 61, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:45:48,931 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3606080.0, ans=0.0 2023-11-28 17:45:57,181 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.84 vs. limit=15.0 2023-11-28 17:46:07,890 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3606146.6666666665, ans=0.125 2023-11-28 17:46:22,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3606280.0, ans=0.015 2023-11-28 17:46:30,848 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 540950 2023-11-28 17:46:30,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3606280.0, ans=0.0 2023-11-28 17:46:33,534 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3606280.0, ans=0.0 2023-11-28 17:46:35,558 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 11900, loss[loss=0.07967, simple_loss=0.1055, pruned_loss=0.02018, audio_tagging_loss=0.006762, over 15187.00 frames. ], tot_loss[loss=0.06496, simple_loss=0.08839, pruned_loss=0.01199, audio_tagging_loss=0.008768, over 3042332.48 frames. ], batch size: 56, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:46:44,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3606346.6666666665, ans=0.2 2023-11-28 17:47:06,547 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.830e+01 8.976e+01 9.494e+01 1.024e+02 1.214e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-28 17:47:21,296 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.03 vs. limit=22.5 2023-11-28 17:47:33,252 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 541000 2023-11-28 17:47:38,220 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 11950, loss[loss=0.07432, simple_loss=0.09839, pruned_loss=0.01571, audio_tagging_loss=0.009422, over 15007.00 frames. ], tot_loss[loss=0.06503, simple_loss=0.08844, pruned_loss=0.01192, audio_tagging_loss=0.008887, over 3036789.83 frames. ], batch size: 58, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:47:46,104 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3606680.0, ans=0.125 2023-11-28 17:47:48,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3606680.0, ans=0.1 2023-11-28 17:47:58,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3606746.6666666665, ans=0.1 2023-11-28 17:47:58,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3606746.6666666665, ans=0.04949747468305833 2023-11-28 17:48:01,130 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.31 vs. limit=12.0 2023-11-28 17:48:20,674 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3606880.0, ans=0.0 2023-11-28 17:48:29,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3606946.6666666665, ans=0.0 2023-11-28 17:48:33,792 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 541050 2023-11-28 17:48:38,219 INFO [train_asr.py:1235] (1/4) Epoch 45, batch 12000, loss[loss=0.07842, simple_loss=0.09856, pruned_loss=0.01842, audio_tagging_loss=0.01073, over 15492.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.08894, pruned_loss=0.01201, audio_tagging_loss=0.008962, over 3038107.51 frames. ], batch size: 57, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:48:38,220 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-28 17:49:16,760 INFO [train_asr.py:1267] (1/4) Epoch 45, validation: loss=0.05759, simple_loss=0.05051, pruned_loss=0.005251, audio_tagging_loss=0.02709, over 4681554.00 frames. 2023-11-28 17:49:16,760 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-28 17:49:16,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3607013.3333333335, ans=0.125 2023-11-28 17:49:19,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3607013.3333333335, ans=0.1 2023-11-28 17:49:27,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3607080.0, ans=0.1 2023-11-28 17:49:32,912 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3607080.0, ans=0.2 2023-11-28 17:49:34,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3607080.0, ans=0.125 2023-11-28 17:49:36,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3607080.0, ans=0.125 2023-11-28 17:49:37,386 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3607080.0, ans=0.0 2023-11-28 17:50:04,583 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3607186.6666666665, ans=0.0 2023-11-28 17:50:04,796 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.38 vs. limit=15.0 2023-11-28 17:50:05,876 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 0, loss[loss=0.07337, simple_loss=0.08875, pruned_loss=0.01202, audio_tagging_loss=0.01696, over 16361.00 frames. ], tot_loss[loss=0.07337, simple_loss=0.08875, pruned_loss=0.01202, audio_tagging_loss=0.01696, over 16361.00 frames. ], batch size: 62, lr: 1.48e-03, grad_scale: 32.0 2023-11-28 17:50:05,877 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-28 17:50:19,479 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.6519, 5.0602, 5.4831, 4.7371], device='cuda:1') 2023-11-28 17:50:20,539 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.0318, 3.6891, 3.6381, 3.4649, 4.2129, 4.2182, 4.3280, 4.1636], device='cuda:1') 2023-11-28 17:50:41,975 INFO [train_asr.py:1267] (1/4) Epoch 46, validation: loss=0.05787, simple_loss=0.05054, pruned_loss=0.005286, audio_tagging_loss=0.02732, over 4681554.00 frames. 2023-11-28 17:50:41,975 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-28 17:50:42,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3607186.6666666665, ans=0.125 2023-11-28 17:50:43,144 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.449e+01 8.886e+01 9.608e+01 1.034e+02 1.479e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-28 17:50:44,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3607186.6666666665, ans=0.125 2023-11-28 17:50:44,727 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3607186.6666666665, ans=0.1 2023-11-28 17:51:01,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3607253.3333333335, ans=0.0 2023-11-28 17:51:01,614 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3607253.3333333335, ans=0.125 2023-11-28 17:51:06,767 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 541100 2023-11-28 17:51:43,544 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 50, loss[loss=0.08497, simple_loss=0.1061, pruned_loss=0.01446, audio_tagging_loss=0.01746, over 15795.00 frames. ], tot_loss[loss=0.07473, simple_loss=0.09254, pruned_loss=0.01232, audio_tagging_loss=0.01614, over 687199.12 frames. ], batch size: 58, lr: 1.48e-03, grad_scale: 16.0 2023-11-28 17:51:53,075 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.98 vs. limit=15.0 2023-11-28 17:52:07,760 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 541150 2023-11-28 17:52:11,450 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 17:52:17,571 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.86 vs. limit=15.0 2023-11-28 17:52:25,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3607720.0, ans=0.125 2023-11-28 17:52:39,680 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.17 vs. limit=22.5 2023-11-28 17:52:44,952 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 100, loss[loss=0.05687, simple_loss=0.05941, pruned_loss=0.008499, audio_tagging_loss=0.01867, over 15869.00 frames. ], tot_loss[loss=0.07319, simple_loss=0.09073, pruned_loss=0.01223, audio_tagging_loss=0.01559, over 1208342.45 frames. ], batch size: 61, lr: 1.48e-03, grad_scale: 16.0 2023-11-28 17:52:47,323 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.554e+01 1.000e+02 1.063e+02 1.121e+02 1.597e+02, threshold=2.127e+02, percent-clipped=0.0 2023-11-28 17:52:49,194 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.41 vs. limit=10.0 2023-11-28 17:52:58,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3607920.0, ans=0.0 2023-11-28 17:53:10,064 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 541200 2023-11-28 17:53:11,825 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.18 vs. limit=12.0 2023-11-28 17:53:14,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3607986.6666666665, ans=0.125 2023-11-28 17:53:23,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3608053.3333333335, ans=0.0 2023-11-28 17:53:36,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3608120.0, ans=0.125 2023-11-28 17:53:43,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3608120.0, ans=0.125 2023-11-28 17:53:47,633 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 150, loss[loss=0.06722, simple_loss=0.0904, pruned_loss=0.01095, audio_tagging_loss=0.01107, over 16478.00 frames. ], tot_loss[loss=0.07101, simple_loss=0.09035, pruned_loss=0.01198, audio_tagging_loss=0.01386, over 1610944.10 frames. ], batch size: 62, lr: 1.48e-03, grad_scale: 16.0 2023-11-28 17:54:11,809 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 541250 2023-11-28 17:54:24,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3608386.6666666665, ans=0.125 2023-11-28 17:54:25,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3608386.6666666665, ans=0.125 2023-11-28 17:54:32,303 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.26 vs. limit=15.0 2023-11-28 17:54:32,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3608386.6666666665, ans=0.125 2023-11-28 17:54:36,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3608453.3333333335, ans=0.125 2023-11-28 17:54:43,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3608453.3333333335, ans=0.125 2023-11-28 17:54:46,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3608453.3333333335, ans=0.05 2023-11-28 17:54:49,308 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 200, loss[loss=0.05757, simple_loss=0.07886, pruned_loss=0.009825, audio_tagging_loss=0.008319, over 14924.00 frames. ], tot_loss[loss=0.07011, simple_loss=0.09148, pruned_loss=0.0122, audio_tagging_loss=0.01217, over 1932830.51 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 17:54:51,572 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.671e+01 9.120e+01 9.843e+01 1.065e+02 1.310e+02, threshold=1.969e+02, percent-clipped=0.0 2023-11-28 17:55:00,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3608586.6666666665, ans=0.125 2023-11-28 17:55:13,226 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 541300 2023-11-28 17:55:16,033 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.04 vs. limit=15.0 2023-11-28 17:55:21,138 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3608653.3333333335, ans=0.125 2023-11-28 17:55:27,654 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3608720.0, ans=0.125 2023-11-28 17:55:35,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3608720.0, ans=0.125 2023-11-28 17:55:36,656 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3608720.0, ans=0.125 2023-11-28 17:55:42,849 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.40 vs. limit=6.0 2023-11-28 17:55:47,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3608786.6666666665, ans=0.0 2023-11-28 17:55:48,890 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3608786.6666666665, ans=0.1 2023-11-28 17:55:50,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3608853.3333333335, ans=0.125 2023-11-28 17:55:50,922 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 250, loss[loss=0.09228, simple_loss=0.1285, pruned_loss=0.02328, audio_tagging_loss=0.004751, over 16059.00 frames. ], tot_loss[loss=0.06937, simple_loss=0.09186, pruned_loss=0.01238, audio_tagging_loss=0.01105, over 2181998.97 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 17:56:13,322 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.45 vs. limit=15.0 2023-11-28 17:56:16,267 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 541350 2023-11-28 17:56:22,021 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.35 vs. limit=10.0 2023-11-28 17:56:32,316 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3609053.3333333335, ans=0.1 2023-11-28 17:56:35,628 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.95 vs. limit=12.0 2023-11-28 17:56:41,578 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.05 vs. limit=10.0 2023-11-28 17:56:53,112 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 300, loss[loss=0.0483, simple_loss=0.05713, pruned_loss=0.007004, audio_tagging_loss=0.01273, over 14578.00 frames. ], tot_loss[loss=0.06822, simple_loss=0.09114, pruned_loss=0.01226, audio_tagging_loss=0.0104, over 2371334.00 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 17:56:54,976 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.58 vs. limit=6.0 2023-11-28 17:56:55,364 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.170e+01 9.069e+01 9.733e+01 1.020e+02 1.805e+02, threshold=1.947e+02, percent-clipped=0.0 2023-11-28 17:56:56,950 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 17:57:03,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3609186.6666666665, ans=0.0 2023-11-28 17:57:17,700 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 541400 2023-11-28 17:57:23,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3609320.0, ans=0.125 2023-11-28 17:57:30,442 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3609386.6666666665, ans=0.2 2023-11-28 17:57:38,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3609386.6666666665, ans=0.125 2023-11-28 17:57:55,488 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 350, loss[loss=0.08422, simple_loss=0.123, pruned_loss=0.01457, audio_tagging_loss=0.008173, over 15506.00 frames. ], tot_loss[loss=0.06753, simple_loss=0.09097, pruned_loss=0.01221, audio_tagging_loss=0.009831, over 2523734.64 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 17:57:59,724 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.69 vs. limit=15.0 2023-11-28 17:58:00,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3609520.0, ans=0.1 2023-11-28 17:58:04,122 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3609520.0, ans=0.0 2023-11-28 17:58:09,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3609586.6666666665, ans=0.5 2023-11-28 17:58:10,874 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.98 vs. limit=15.0 2023-11-28 17:58:19,660 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 541450 2023-11-28 17:58:21,044 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 17:58:23,644 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.81 vs. limit=6.0 2023-11-28 17:58:31,750 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.69 vs. limit=15.0 2023-11-28 17:58:40,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3609720.0, ans=0.0 2023-11-28 17:58:48,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3609786.6666666665, ans=0.125 2023-11-28 17:58:57,268 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 400, loss[loss=0.05664, simple_loss=0.07253, pruned_loss=0.01039, audio_tagging_loss=0.009987, over 14938.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.08998, pruned_loss=0.01201, audio_tagging_loss=0.009546, over 2639998.91 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 17:58:59,616 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.699e+01 9.057e+01 9.604e+01 1.022e+02 1.428e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-28 17:59:04,800 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3609853.3333333335, ans=0.1 2023-11-28 17:59:08,200 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3609920.0, ans=0.025 2023-11-28 17:59:17,207 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3609920.0, ans=0.125 2023-11-28 17:59:21,499 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 541500 2023-11-28 17:59:43,215 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3610053.3333333335, ans=0.125 2023-11-28 17:59:58,030 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 450, loss[loss=0.06981, simple_loss=0.09112, pruned_loss=0.01445, audio_tagging_loss=0.009806, over 14926.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.09064, pruned_loss=0.012, audio_tagging_loss=0.009227, over 2730857.56 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 18:00:07,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3610186.6666666665, ans=0.1 2023-11-28 18:00:23,715 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 541550 2023-11-28 18:00:23,924 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3610320.0, ans=0.1 2023-11-28 18:00:33,320 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.27 vs. limit=6.0 2023-11-28 18:00:40,468 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.42 vs. limit=15.0 2023-11-28 18:01:00,954 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 500, loss[loss=0.05018, simple_loss=0.06482, pruned_loss=0.008112, audio_tagging_loss=0.009657, over 13508.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.09013, pruned_loss=0.01206, audio_tagging_loss=0.009028, over 2796534.59 frames. ], batch size: 52, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:01:04,992 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.256e+01 8.722e+01 9.408e+01 1.020e+02 1.286e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-28 18:01:09,888 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3610520.0, ans=0.125 2023-11-28 18:01:25,566 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 541600 2023-11-28 18:01:41,051 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.18 vs. limit=15.0 2023-11-28 18:02:01,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3610853.3333333335, ans=0.125 2023-11-28 18:02:02,632 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 550, loss[loss=0.06666, simple_loss=0.09324, pruned_loss=0.01031, audio_tagging_loss=0.009731, over 15054.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.08981, pruned_loss=0.01207, audio_tagging_loss=0.009004, over 2849782.71 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:02:08,515 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.20 vs. limit=12.0 2023-11-28 18:02:17,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3610920.0, ans=0.0 2023-11-28 18:02:18,518 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3610920.0, ans=0.0 2023-11-28 18:02:27,411 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 541650 2023-11-28 18:02:47,546 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3611053.3333333335, ans=0.125 2023-11-28 18:03:01,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3611120.0, ans=0.1 2023-11-28 18:03:04,239 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 600, loss[loss=0.05027, simple_loss=0.05886, pruned_loss=0.007225, audio_tagging_loss=0.01361, over 13569.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08956, pruned_loss=0.01199, audio_tagging_loss=0.008916, over 2891778.66 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:03:05,025 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.55 vs. limit=15.0 2023-11-28 18:03:05,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3611186.6666666665, ans=0.05 2023-11-28 18:03:07,701 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.794e+01 9.115e+01 9.737e+01 1.046e+02 1.247e+02, threshold=1.947e+02, percent-clipped=0.0 2023-11-28 18:03:08,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3611186.6666666665, ans=0.2 2023-11-28 18:03:17,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3611253.3333333335, ans=0.2 2023-11-28 18:03:20,735 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3611253.3333333335, ans=10.0 2023-11-28 18:03:20,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3611253.3333333335, ans=0.125 2023-11-28 18:03:22,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3611253.3333333335, ans=0.125 2023-11-28 18:03:29,326 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 541700 2023-11-28 18:03:35,816 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.94 vs. limit=22.5 2023-11-28 18:03:45,499 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3611386.6666666665, ans=0.0 2023-11-28 18:03:47,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3611386.6666666665, ans=0.125 2023-11-28 18:03:53,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3611453.3333333335, ans=0.1 2023-11-28 18:03:59,402 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.87 vs. limit=22.5 2023-11-28 18:04:05,981 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 650, loss[loss=0.07016, simple_loss=0.1039, pruned_loss=0.009811, audio_tagging_loss=0.008399, over 15556.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.09033, pruned_loss=0.01216, audio_tagging_loss=0.008767, over 2920266.15 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:04:15,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3611520.0, ans=0.2 2023-11-28 18:04:15,744 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.11 vs. limit=22.5 2023-11-28 18:04:19,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3611586.6666666665, ans=0.125 2023-11-28 18:04:31,674 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 541750 2023-11-28 18:04:36,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3611653.3333333335, ans=0.1 2023-11-28 18:04:42,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3611720.0, ans=0.125 2023-11-28 18:04:56,385 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.08 vs. limit=15.0 2023-11-28 18:05:08,070 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 700, loss[loss=0.0838, simple_loss=0.1179, pruned_loss=0.016, audio_tagging_loss=0.00883, over 15053.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.09023, pruned_loss=0.01204, audio_tagging_loss=0.008783, over 2957158.32 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:05:12,355 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.880e+01 8.893e+01 9.585e+01 1.037e+02 1.398e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-28 18:05:16,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3611853.3333333335, ans=0.0 2023-11-28 18:05:16,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3611853.3333333335, ans=0.0 2023-11-28 18:05:28,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3611920.0, ans=0.0 2023-11-28 18:05:33,495 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 541800 2023-11-28 18:06:06,351 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3612120.0, ans=0.125 2023-11-28 18:06:11,815 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 750, loss[loss=0.06959, simple_loss=0.08418, pruned_loss=0.01847, audio_tagging_loss=0.009025, over 14911.00 frames. ], tot_loss[loss=0.06624, simple_loss=0.09053, pruned_loss=0.01213, audio_tagging_loss=0.008848, over 2977087.66 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:06:12,145 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 18:06:23,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3612253.3333333335, ans=0.125 2023-11-28 18:06:36,878 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 541850 2023-11-28 18:06:37,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3612320.0, ans=0.0 2023-11-28 18:06:46,178 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3612320.0, ans=0.125 2023-11-28 18:06:54,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3612386.6666666665, ans=0.125 2023-11-28 18:07:01,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3612453.3333333335, ans=0.1 2023-11-28 18:07:05,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3612453.3333333335, ans=0.0 2023-11-28 18:07:14,141 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 800, loss[loss=0.06428, simple_loss=0.08005, pruned_loss=0.0094, audio_tagging_loss=0.01485, over 14367.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.09096, pruned_loss=0.0123, audio_tagging_loss=0.008877, over 2998008.30 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 18:07:16,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3612520.0, ans=0.1 2023-11-28 18:07:17,635 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.480e+01 9.017e+01 9.748e+01 1.044e+02 1.462e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-28 18:07:27,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3612586.6666666665, ans=0.2 2023-11-28 18:07:39,736 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 541900 2023-11-28 18:07:43,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3612653.3333333335, ans=0.1 2023-11-28 18:07:45,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3612653.3333333335, ans=0.125 2023-11-28 18:07:50,541 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3612720.0, ans=0.1 2023-11-28 18:07:54,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3612720.0, ans=0.125 2023-11-28 18:07:59,635 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3612720.0, ans=0.05 2023-11-28 18:08:03,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3612786.6666666665, ans=0.5 2023-11-28 18:08:16,497 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 850, loss[loss=0.06201, simple_loss=0.08289, pruned_loss=0.01132, audio_tagging_loss=0.009244, over 14405.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.09041, pruned_loss=0.01229, audio_tagging_loss=0.008899, over 3006919.27 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 8.0 2023-11-28 18:08:24,151 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.45 vs. limit=6.0 2023-11-28 18:08:41,277 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 541950 2023-11-28 18:08:57,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3613053.3333333335, ans=0.0 2023-11-28 18:09:18,560 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 900, loss[loss=0.06305, simple_loss=0.09263, pruned_loss=0.01072, audio_tagging_loss=0.006011, over 14806.00 frames. ], tot_loss[loss=0.06673, simple_loss=0.09099, pruned_loss=0.01231, audio_tagging_loss=0.00893, over 3017224.88 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 8.0 2023-11-28 18:09:19,937 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3613186.6666666665, ans=0.0 2023-11-28 18:09:20,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3613186.6666666665, ans=0.125 2023-11-28 18:09:24,260 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.829e+01 8.864e+01 9.446e+01 1.016e+02 1.435e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-28 18:09:28,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff3.min_abs, batch_count=3613186.6666666665, ans=0.2 2023-11-28 18:09:29,847 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3613253.3333333335, ans=0.0 2023-11-28 18:09:38,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3613253.3333333335, ans=0.125 2023-11-28 18:09:43,182 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 542000 2023-11-28 18:09:57,428 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.26 vs. limit=15.0 2023-11-28 18:10:08,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3613453.3333333335, ans=0.125 2023-11-28 18:10:16,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3613453.3333333335, ans=0.0 2023-11-28 18:10:20,687 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 950, loss[loss=0.05515, simple_loss=0.07438, pruned_loss=0.007927, audio_tagging_loss=0.01004, over 14163.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.09002, pruned_loss=0.01214, audio_tagging_loss=0.008855, over 3026093.68 frames. ], batch size: 53, lr: 1.47e-03, grad_scale: 8.0 2023-11-28 18:10:45,429 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 542050 2023-11-28 18:11:03,144 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3613720.0, ans=0.2 2023-11-28 18:11:08,782 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3613786.6666666665, ans=0.0 2023-11-28 18:11:17,664 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3613786.6666666665, ans=0.0 2023-11-28 18:11:19,966 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 18:11:21,883 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 1000, loss[loss=0.08111, simple_loss=0.1179, pruned_loss=0.0135, audio_tagging_loss=0.008656, over 15215.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08954, pruned_loss=0.01201, audio_tagging_loss=0.008758, over 3028250.88 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 8.0 2023-11-28 18:11:27,661 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.684e+01 8.919e+01 9.596e+01 1.036e+02 1.232e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-28 18:11:31,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3613853.3333333335, ans=0.125 2023-11-28 18:11:46,667 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 542100 2023-11-28 18:11:50,895 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 18:11:53,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3613986.6666666665, ans=0.125 2023-11-28 18:11:57,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3613986.6666666665, ans=0.1 2023-11-28 18:12:00,035 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.67 vs. limit=12.0 2023-11-28 18:12:24,610 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 1050, loss[loss=0.05375, simple_loss=0.07145, pruned_loss=0.007926, audio_tagging_loss=0.0101, over 16422.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.08923, pruned_loss=0.01186, audio_tagging_loss=0.008656, over 3030230.57 frames. ], batch size: 61, lr: 1.47e-03, grad_scale: 8.0 2023-11-28 18:12:33,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3614186.6666666665, ans=0.5 2023-11-28 18:12:41,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3614253.3333333335, ans=0.2 2023-11-28 18:12:49,439 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 542150 2023-11-28 18:13:00,365 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3614386.6666666665, ans=0.125 2023-11-28 18:13:06,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3614386.6666666665, ans=0.07 2023-11-28 18:13:15,695 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3614453.3333333335, ans=0.2 2023-11-28 18:13:25,774 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 18:13:26,584 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 1100, loss[loss=0.05323, simple_loss=0.0669, pruned_loss=0.00974, audio_tagging_loss=0.01004, over 14756.00 frames. ], tot_loss[loss=0.06538, simple_loss=0.0896, pruned_loss=0.01202, audio_tagging_loss=0.008556, over 3036437.84 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 8.0 2023-11-28 18:13:31,277 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 18:13:32,352 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.883e+01 8.948e+01 9.564e+01 1.065e+02 1.707e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-28 18:13:35,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3614520.0, ans=0.0 2023-11-28 18:13:41,738 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.99 vs. limit=22.5 2023-11-28 18:13:48,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3614586.6666666665, ans=0.125 2023-11-28 18:13:50,860 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 542200 2023-11-28 18:14:04,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3614720.0, ans=0.125 2023-11-28 18:14:05,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff3.min_abs, batch_count=3614720.0, ans=0.2 2023-11-28 18:14:08,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3614720.0, ans=0.125 2023-11-28 18:14:28,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3614853.3333333335, ans=0.2 2023-11-28 18:14:28,996 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 1150, loss[loss=0.06298, simple_loss=0.08419, pruned_loss=0.01189, audio_tagging_loss=0.008996, over 14655.00 frames. ], tot_loss[loss=0.06504, simple_loss=0.08898, pruned_loss=0.01205, audio_tagging_loss=0.008495, over 3040036.80 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 8.0 2023-11-28 18:14:53,893 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 542250 2023-11-28 18:15:02,746 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3614986.6666666665, ans=0.1 2023-11-28 18:15:11,890 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3615053.3333333335, ans=0.125 2023-11-28 18:15:15,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3615053.3333333335, ans=0.0 2023-11-28 18:15:24,243 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.05 vs. limit=10.0 2023-11-28 18:15:28,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3615120.0, ans=0.1 2023-11-28 18:15:31,099 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 1200, loss[loss=0.07087, simple_loss=0.09572, pruned_loss=0.01462, audio_tagging_loss=0.008385, over 13654.00 frames. ], tot_loss[loss=0.06483, simple_loss=0.08869, pruned_loss=0.01201, audio_tagging_loss=0.008471, over 3038501.24 frames. ], batch size: 53, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:15:33,294 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.46 vs. limit=8.0 2023-11-28 18:15:36,980 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.628e+01 8.860e+01 9.476e+01 1.010e+02 2.147e+02, threshold=1.895e+02, percent-clipped=1.0 2023-11-28 18:15:51,514 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.47 vs. limit=15.0 2023-11-28 18:15:55,751 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 542300 2023-11-28 18:16:08,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3615386.6666666665, ans=0.0 2023-11-28 18:16:11,931 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3615386.6666666665, ans=0.0 2023-11-28 18:16:28,529 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.45 vs. limit=10.0 2023-11-28 18:16:33,388 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 1250, loss[loss=0.04757, simple_loss=0.05987, pruned_loss=0.007318, audio_tagging_loss=0.01032, over 14176.00 frames. ], tot_loss[loss=0.06453, simple_loss=0.08826, pruned_loss=0.01193, audio_tagging_loss=0.008471, over 3043520.46 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:16:44,376 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3615586.6666666665, ans=0.2 2023-11-28 18:16:57,129 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.09 vs. limit=22.5 2023-11-28 18:16:57,676 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 542350 2023-11-28 18:17:13,732 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.95 vs. limit=15.0 2023-11-28 18:17:19,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3615720.0, ans=0.0 2023-11-28 18:17:35,296 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 1300, loss[loss=0.06268, simple_loss=0.08642, pruned_loss=0.01296, audio_tagging_loss=0.006505, over 13084.00 frames. ], tot_loss[loss=0.06387, simple_loss=0.08731, pruned_loss=0.01174, audio_tagging_loss=0.008474, over 3037501.45 frames. ], batch size: 52, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:17:41,171 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.626e+01 8.784e+01 9.305e+01 1.002e+02 1.226e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-28 18:17:44,917 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3615853.3333333335, ans=0.125 2023-11-28 18:17:52,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3615920.0, ans=0.0 2023-11-28 18:17:56,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3615920.0, ans=0.125 2023-11-28 18:17:59,165 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 542400 2023-11-28 18:18:03,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3615986.6666666665, ans=0.09899494936611666 2023-11-28 18:18:09,673 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.12 vs. limit=10.0 2023-11-28 18:18:16,399 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3616053.3333333335, ans=0.0 2023-11-28 18:18:20,538 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3616053.3333333335, ans=0.0 2023-11-28 18:18:22,219 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.40 vs. limit=12.0 2023-11-28 18:18:37,357 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 1350, loss[loss=0.06656, simple_loss=0.09958, pruned_loss=0.009144, audio_tagging_loss=0.007628, over 14530.00 frames. ], tot_loss[loss=0.0641, simple_loss=0.08758, pruned_loss=0.01175, audio_tagging_loss=0.00856, over 3043856.33 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:19:00,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3616253.3333333335, ans=0.0 2023-11-28 18:19:02,225 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 542450 2023-11-28 18:19:08,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3616320.0, ans=0.125 2023-11-28 18:19:23,292 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 18:19:30,733 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.08 vs. limit=6.0 2023-11-28 18:19:38,413 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 1400, loss[loss=0.04852, simple_loss=0.0642, pruned_loss=0.006173, audio_tagging_loss=0.01025, over 14763.00 frames. ], tot_loss[loss=0.06496, simple_loss=0.08875, pruned_loss=0.01198, audio_tagging_loss=0.008602, over 3046076.87 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:19:41,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3616520.0, ans=0.0 2023-11-28 18:19:45,127 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.015e+01 9.002e+01 9.471e+01 1.001e+02 1.235e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-28 18:20:03,630 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 542500 2023-11-28 18:20:16,071 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 18:20:26,382 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3616720.0, ans=0.125 2023-11-28 18:20:40,622 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 1450, loss[loss=0.0589, simple_loss=0.07623, pruned_loss=0.01306, audio_tagging_loss=0.007724, over 14383.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08931, pruned_loss=0.01208, audio_tagging_loss=0.008632, over 3045743.60 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 8.0 2023-11-28 18:20:40,898 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3616853.3333333335, ans=0.05 2023-11-28 18:20:42,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3616853.3333333335, ans=0.0 2023-11-28 18:20:43,742 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3616853.3333333335, ans=0.04949747468305833 2023-11-28 18:20:46,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3616853.3333333335, ans=0.125 2023-11-28 18:20:47,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3616853.3333333335, ans=0.0 2023-11-28 18:20:55,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3616920.0, ans=0.125 2023-11-28 18:21:03,465 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.05 vs. limit=15.0 2023-11-28 18:21:05,433 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 542550 2023-11-28 18:21:14,582 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3616986.6666666665, ans=0.07 2023-11-28 18:21:26,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3617053.3333333335, ans=0.125 2023-11-28 18:21:31,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3617120.0, ans=0.5 2023-11-28 18:21:42,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3617186.6666666665, ans=0.0 2023-11-28 18:21:42,944 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 1500, loss[loss=0.05808, simple_loss=0.07653, pruned_loss=0.00901, audio_tagging_loss=0.01081, over 15872.00 frames. ], tot_loss[loss=0.06538, simple_loss=0.0892, pruned_loss=0.01207, audio_tagging_loss=0.008702, over 3041168.86 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 8.0 2023-11-28 18:21:50,637 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.473e+01 9.243e+01 1.008e+02 1.066e+02 1.395e+02, threshold=2.017e+02, percent-clipped=0.0 2023-11-28 18:22:07,897 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 542600 2023-11-28 18:22:11,760 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.26 vs. limit=15.0 2023-11-28 18:22:45,110 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 1550, loss[loss=0.06585, simple_loss=0.08404, pruned_loss=0.01229, audio_tagging_loss=0.01154, over 14918.00 frames. ], tot_loss[loss=0.06506, simple_loss=0.08846, pruned_loss=0.01196, audio_tagging_loss=0.008876, over 3042925.22 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 8.0 2023-11-28 18:23:10,320 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 542650 2023-11-28 18:23:12,768 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 18:23:19,376 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3617653.3333333335, ans=0.125 2023-11-28 18:23:20,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3617653.3333333335, ans=0.125 2023-11-28 18:23:20,795 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.08 vs. limit=15.0 2023-11-28 18:23:36,738 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.83 vs. limit=12.0 2023-11-28 18:23:38,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3617786.6666666665, ans=0.1 2023-11-28 18:23:41,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3617786.6666666665, ans=0.125 2023-11-28 18:23:47,219 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 1600, loss[loss=0.05848, simple_loss=0.0813, pruned_loss=0.009191, audio_tagging_loss=0.008639, over 14932.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08887, pruned_loss=0.01197, audio_tagging_loss=0.008916, over 3042164.62 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:23:51,741 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.71 vs. limit=15.0 2023-11-28 18:23:54,776 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.889e+01 9.133e+01 9.762e+01 1.043e+02 1.262e+02, threshold=1.952e+02, percent-clipped=0.0 2023-11-28 18:24:11,815 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 542700 2023-11-28 18:24:40,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=3618120.0, ans=10.0 2023-11-28 18:24:48,508 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 1650, loss[loss=0.07129, simple_loss=0.09777, pruned_loss=0.01139, audio_tagging_loss=0.01101, over 14864.00 frames. ], tot_loss[loss=0.0652, simple_loss=0.08866, pruned_loss=0.01191, audio_tagging_loss=0.008961, over 3039390.11 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:24:51,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3618186.6666666665, ans=0.0 2023-11-28 18:24:56,404 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 18:25:08,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3618253.3333333335, ans=0.0 2023-11-28 18:25:13,453 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 542750 2023-11-28 18:25:20,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3618320.0, ans=0.5 2023-11-28 18:25:21,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3618320.0, ans=0.2 2023-11-28 18:25:31,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3618386.6666666665, ans=0.125 2023-11-28 18:25:49,861 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 1700, loss[loss=0.05893, simple_loss=0.07731, pruned_loss=0.01122, audio_tagging_loss=0.009059, over 15282.00 frames. ], tot_loss[loss=0.06509, simple_loss=0.08848, pruned_loss=0.01194, audio_tagging_loss=0.00891, over 3043200.83 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:25:52,814 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.14 vs. limit=15.0 2023-11-28 18:25:57,434 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.816e+01 8.880e+01 9.352e+01 1.002e+02 1.354e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-28 18:26:04,272 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.15 vs. limit=15.0 2023-11-28 18:26:15,601 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 542800 2023-11-28 18:26:24,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3618653.3333333335, ans=10.0 2023-11-28 18:26:49,121 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3618786.6666666665, ans=0.0 2023-11-28 18:26:52,323 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 1750, loss[loss=0.06572, simple_loss=0.09773, pruned_loss=0.00797, audio_tagging_loss=0.008881, over 16531.00 frames. ], tot_loss[loss=0.06463, simple_loss=0.08793, pruned_loss=0.01176, audio_tagging_loss=0.008907, over 3044872.93 frames. ], batch size: 62, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:26:54,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3618853.3333333335, ans=0.125 2023-11-28 18:27:01,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3618853.3333333335, ans=0.1 2023-11-28 18:27:11,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3618920.0, ans=0.0 2023-11-28 18:27:12,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3618920.0, ans=0.2 2023-11-28 18:27:17,752 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 542850 2023-11-28 18:27:22,758 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3618986.6666666665, ans=0.125 2023-11-28 18:27:53,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3619186.6666666665, ans=0.1 2023-11-28 18:27:54,817 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 1800, loss[loss=0.05522, simple_loss=0.0729, pruned_loss=0.009902, audio_tagging_loss=0.008866, over 14834.00 frames. ], tot_loss[loss=0.0643, simple_loss=0.08774, pruned_loss=0.01168, audio_tagging_loss=0.00875, over 3044119.02 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:28:02,580 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.623e+01 8.817e+01 9.553e+01 1.013e+02 1.527e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-28 18:28:08,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3619253.3333333335, ans=0.025 2023-11-28 18:28:12,551 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.70 vs. limit=15.0 2023-11-28 18:28:19,592 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 542900 2023-11-28 18:28:32,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3619386.6666666665, ans=0.1 2023-11-28 18:28:33,383 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3619386.6666666665, ans=0.2 2023-11-28 18:28:34,854 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.98 vs. limit=15.0 2023-11-28 18:28:42,668 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.69 vs. limit=15.0 2023-11-28 18:28:56,454 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 1850, loss[loss=0.07289, simple_loss=0.11, pruned_loss=0.01165, audio_tagging_loss=0.00623, over 14978.00 frames. ], tot_loss[loss=0.06447, simple_loss=0.08826, pruned_loss=0.01168, audio_tagging_loss=0.008659, over 3044920.98 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:29:21,039 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 542950 2023-11-28 18:29:58,109 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 1900, loss[loss=0.04756, simple_loss=0.05805, pruned_loss=0.004076, audio_tagging_loss=0.01446, over 15373.00 frames. ], tot_loss[loss=0.06504, simple_loss=0.08925, pruned_loss=0.01183, audio_tagging_loss=0.008593, over 3048356.33 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:30:06,338 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.360e+01 8.846e+01 9.695e+01 1.030e+02 1.290e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-28 18:30:24,960 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 543000 2023-11-28 18:30:47,695 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3620053.3333333335, ans=0.2 2023-11-28 18:31:01,999 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 1950, loss[loss=0.07155, simple_loss=0.0981, pruned_loss=0.01317, audio_tagging_loss=0.009328, over 16490.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.08938, pruned_loss=0.01196, audio_tagging_loss=0.008484, over 3051997.41 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:31:13,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3620186.6666666665, ans=0.1 2023-11-28 18:31:14,398 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3620253.3333333335, ans=0.125 2023-11-28 18:31:25,132 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.64 vs. limit=15.0 2023-11-28 18:31:27,710 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 543050 2023-11-28 18:31:42,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3620386.6666666665, ans=0.125 2023-11-28 18:31:45,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3620386.6666666665, ans=0.125 2023-11-28 18:32:05,241 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 2000, loss[loss=0.05882, simple_loss=0.07222, pruned_loss=0.01224, audio_tagging_loss=0.01047, over 14654.00 frames. ], tot_loss[loss=0.06435, simple_loss=0.08789, pruned_loss=0.01181, audio_tagging_loss=0.008595, over 3046282.95 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 18:32:12,238 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.632e+01 8.843e+01 9.517e+01 1.017e+02 1.675e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-28 18:32:13,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3620520.0, ans=0.0 2023-11-28 18:32:17,927 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3620586.6666666665, ans=0.0 2023-11-28 18:32:20,821 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.99 vs. limit=15.0 2023-11-28 18:32:22,828 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3620586.6666666665, ans=0.125 2023-11-28 18:32:30,345 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 543100 2023-11-28 18:33:07,924 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 2050, loss[loss=0.07581, simple_loss=0.1086, pruned_loss=0.01433, audio_tagging_loss=0.007181, over 15846.00 frames. ], tot_loss[loss=0.06448, simple_loss=0.08795, pruned_loss=0.01189, audio_tagging_loss=0.008615, over 3038861.41 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:33:17,538 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3620853.3333333335, ans=0.125 2023-11-28 18:33:32,773 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 543150 2023-11-28 18:33:43,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3620986.6666666665, ans=0.1 2023-11-28 18:34:02,614 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.68 vs. limit=15.0 2023-11-28 18:34:09,475 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 2100, loss[loss=0.06472, simple_loss=0.08725, pruned_loss=0.01176, audio_tagging_loss=0.009335, over 16015.00 frames. ], tot_loss[loss=0.06449, simple_loss=0.08809, pruned_loss=0.01186, audio_tagging_loss=0.008588, over 3042704.78 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:34:10,948 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3621186.6666666665, ans=0.0 2023-11-28 18:34:13,398 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3621186.6666666665, ans=0.0 2023-11-28 18:34:17,682 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.501e+01 8.878e+01 9.444e+01 1.002e+02 1.258e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-28 18:34:18,423 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.81 vs. limit=15.0 2023-11-28 18:34:34,190 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 543200 2023-11-28 18:34:36,921 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3621320.0, ans=0.125 2023-11-28 18:34:37,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3621320.0, ans=0.125 2023-11-28 18:34:38,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3621320.0, ans=0.5 2023-11-28 18:34:59,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3621453.3333333335, ans=0.2 2023-11-28 18:35:12,351 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 2150, loss[loss=0.06031, simple_loss=0.08315, pruned_loss=0.01035, audio_tagging_loss=0.008385, over 16446.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.08983, pruned_loss=0.01221, audio_tagging_loss=0.008529, over 3046575.49 frames. ], batch size: 63, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:35:16,455 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.72 vs. limit=15.0 2023-11-28 18:35:36,810 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 543250 2023-11-28 18:35:50,160 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 18:35:57,169 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3621720.0, ans=0.125 2023-11-28 18:36:01,660 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.17 vs. limit=15.0 2023-11-28 18:36:11,369 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.75 vs. limit=15.0 2023-11-28 18:36:13,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3621853.3333333335, ans=0.1 2023-11-28 18:36:14,606 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 2200, loss[loss=0.04577, simple_loss=0.06582, pruned_loss=0.005209, audio_tagging_loss=0.007648, over 15154.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.08895, pruned_loss=0.01208, audio_tagging_loss=0.008636, over 3043577.00 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:36:22,867 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.392e+01 9.070e+01 9.676e+01 1.027e+02 1.399e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-28 18:36:31,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3621920.0, ans=0.1 2023-11-28 18:36:38,706 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 543300 2023-11-28 18:37:16,438 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 2250, loss[loss=0.06368, simple_loss=0.08819, pruned_loss=0.01054, audio_tagging_loss=0.00904, over 15139.00 frames. ], tot_loss[loss=0.06531, simple_loss=0.08917, pruned_loss=0.01209, audio_tagging_loss=0.008635, over 3049536.14 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:37:31,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3622253.3333333335, ans=0.0 2023-11-28 18:37:41,301 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 543350 2023-11-28 18:37:43,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3622320.0, ans=0.0 2023-11-28 18:38:08,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3622453.3333333335, ans=0.015 2023-11-28 18:38:17,971 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 2300, loss[loss=0.06583, simple_loss=0.09802, pruned_loss=0.009861, audio_tagging_loss=0.006963, over 14814.00 frames. ], tot_loss[loss=0.06505, simple_loss=0.0888, pruned_loss=0.01194, audio_tagging_loss=0.008711, over 3046821.55 frames. ], batch size: 53, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:38:19,362 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3622520.0, ans=0.125 2023-11-28 18:38:21,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3622520.0, ans=0.125 2023-11-28 18:38:26,657 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.639e+01 8.756e+01 9.268e+01 1.034e+02 1.497e+02, threshold=1.854e+02, percent-clipped=0.0 2023-11-28 18:38:27,104 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3622520.0, ans=0.125 2023-11-28 18:38:29,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3622586.6666666665, ans=0.125 2023-11-28 18:38:31,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3622586.6666666665, ans=0.125 2023-11-28 18:38:42,580 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 543400 2023-11-28 18:39:05,436 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3622720.0, ans=0.0 2023-11-28 18:39:14,341 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 18:39:20,161 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 2350, loss[loss=0.05756, simple_loss=0.07525, pruned_loss=0.01055, audio_tagging_loss=0.009383, over 14922.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08877, pruned_loss=0.012, audio_tagging_loss=0.008764, over 3044504.18 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:39:23,533 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3622853.3333333335, ans=0.0 2023-11-28 18:39:26,322 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.85 vs. limit=15.0 2023-11-28 18:39:28,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3622853.3333333335, ans=0.1 2023-11-28 18:39:34,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3622920.0, ans=0.0 2023-11-28 18:39:45,292 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 543450 2023-11-28 18:39:58,202 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.34 vs. limit=15.0 2023-11-28 18:40:05,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3623053.3333333335, ans=0.0 2023-11-28 18:40:22,009 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 2400, loss[loss=0.08773, simple_loss=0.129, pruned_loss=0.01524, audio_tagging_loss=0.007975, over 15098.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.0889, pruned_loss=0.01208, audio_tagging_loss=0.00882, over 3044363.34 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 18:40:26,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3623186.6666666665, ans=0.125 2023-11-28 18:40:27,556 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3623186.6666666665, ans=0.125 2023-11-28 18:40:30,765 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.829e+01 8.938e+01 9.455e+01 1.032e+02 1.610e+02, threshold=1.891e+02, percent-clipped=0.0 2023-11-28 18:40:32,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3623186.6666666665, ans=0.025 2023-11-28 18:40:35,145 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.18 vs. limit=15.0 2023-11-28 18:40:46,759 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 543500 2023-11-28 18:41:23,590 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 2450, loss[loss=0.0891, simple_loss=0.1168, pruned_loss=0.02225, audio_tagging_loss=0.008434, over 14691.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.08943, pruned_loss=0.01232, audio_tagging_loss=0.008936, over 3046597.50 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 18:41:40,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3623586.6666666665, ans=0.125 2023-11-28 18:41:40,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3623586.6666666665, ans=0.0 2023-11-28 18:41:46,581 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.73 vs. limit=15.0 2023-11-28 18:41:49,460 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 543550 2023-11-28 18:41:49,593 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3623653.3333333335, ans=0.1 2023-11-28 18:42:00,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3623720.0, ans=0.0 2023-11-28 18:42:05,774 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.25 vs. limit=22.5 2023-11-28 18:42:14,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3623786.6666666665, ans=0.125 2023-11-28 18:42:25,837 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 2500, loss[loss=0.08885, simple_loss=0.1293, pruned_loss=0.01868, audio_tagging_loss=0.00553, over 15916.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.08977, pruned_loss=0.01225, audio_tagging_loss=0.008931, over 3046507.50 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 18:42:35,309 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.618e+01 8.803e+01 9.255e+01 1.000e+02 1.311e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-28 18:42:36,938 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.19 vs. limit=15.0 2023-11-28 18:42:51,521 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 543600 2023-11-28 18:42:52,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3623986.6666666665, ans=0.0 2023-11-28 18:43:21,882 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 18:43:28,648 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 2550, loss[loss=0.07716, simple_loss=0.1035, pruned_loss=0.01634, audio_tagging_loss=0.009057, over 15221.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.09007, pruned_loss=0.01225, audio_tagging_loss=0.00881, over 3043172.59 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 18:43:30,186 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3624186.6666666665, ans=0.95 2023-11-28 18:43:32,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=3624186.6666666665, ans=0.02 2023-11-28 18:43:50,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3624253.3333333335, ans=0.125 2023-11-28 18:43:53,724 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 543650 2023-11-28 18:44:01,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3624320.0, ans=0.0 2023-11-28 18:44:15,021 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 18:44:15,436 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.08 vs. limit=22.5 2023-11-28 18:44:18,671 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3624453.3333333335, ans=0.125 2023-11-28 18:44:22,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3624453.3333333335, ans=0.0 2023-11-28 18:44:26,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3624453.3333333335, ans=0.2 2023-11-28 18:44:29,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3624520.0, ans=0.125 2023-11-28 18:44:30,692 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 2600, loss[loss=0.0774, simple_loss=0.1066, pruned_loss=0.01685, audio_tagging_loss=0.007227, over 14511.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.08888, pruned_loss=0.01206, audio_tagging_loss=0.008694, over 3040248.13 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 18:44:33,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3624520.0, ans=0.125 2023-11-28 18:44:38,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3624520.0, ans=0.2 2023-11-28 18:44:39,476 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.728e+01 8.738e+01 9.385e+01 1.004e+02 1.373e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-28 18:44:56,248 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 543700 2023-11-28 18:45:05,304 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.65 vs. limit=15.0 2023-11-28 18:45:20,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3624786.6666666665, ans=0.125 2023-11-28 18:45:32,765 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 2650, loss[loss=0.08136, simple_loss=0.1177, pruned_loss=0.01373, audio_tagging_loss=0.008755, over 16358.00 frames. ], tot_loss[loss=0.06492, simple_loss=0.08836, pruned_loss=0.01207, audio_tagging_loss=0.008676, over 3044490.16 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 18:45:58,447 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 543750 2023-11-28 18:46:23,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3625120.0, ans=0.125 2023-11-28 18:46:29,122 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3625120.0, ans=0.125 2023-11-28 18:46:35,475 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 2700, loss[loss=0.06326, simple_loss=0.09161, pruned_loss=0.01061, audio_tagging_loss=0.006852, over 15643.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08876, pruned_loss=0.01221, audio_tagging_loss=0.008594, over 3053299.23 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 18:46:39,729 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.19 vs. limit=12.0 2023-11-28 18:46:44,266 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.729e+01 9.009e+01 9.559e+01 1.022e+02 1.303e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-28 18:46:49,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3625253.3333333335, ans=0.125 2023-11-28 18:47:00,435 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 543800 2023-11-28 18:47:21,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3625386.6666666665, ans=0.125 2023-11-28 18:47:37,938 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 2750, loss[loss=0.08742, simple_loss=0.1292, pruned_loss=0.0156, audio_tagging_loss=0.007205, over 14374.00 frames. ], tot_loss[loss=0.06495, simple_loss=0.08829, pruned_loss=0.01214, audio_tagging_loss=0.008667, over 3051231.31 frames. ], batch size: 53, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:47:42,157 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.75 vs. limit=22.5 2023-11-28 18:47:44,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3625520.0, ans=0.0 2023-11-28 18:48:02,699 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 543850 2023-11-28 18:48:20,263 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3625720.0, ans=0.05 2023-11-28 18:48:20,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3625720.0, ans=0.0 2023-11-28 18:48:23,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3625720.0, ans=0.125 2023-11-28 18:48:32,360 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 18:48:33,013 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.54 vs. limit=10.0 2023-11-28 18:48:39,506 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 2800, loss[loss=0.06647, simple_loss=0.09202, pruned_loss=0.01172, audio_tagging_loss=0.008743, over 15856.00 frames. ], tot_loss[loss=0.06444, simple_loss=0.0877, pruned_loss=0.01193, audio_tagging_loss=0.008659, over 3058480.83 frames. ], batch size: 61, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 18:48:49,557 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.340e+01 8.943e+01 9.576e+01 1.040e+02 1.629e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-28 18:48:52,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3625920.0, ans=0.0 2023-11-28 18:48:54,764 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.08 vs. limit=22.5 2023-11-28 18:48:59,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten.whitening_limit, batch_count=3625920.0, ans=22.5 2023-11-28 18:49:05,251 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 543900 2023-11-28 18:49:05,902 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.92 vs. limit=12.0 2023-11-28 18:49:06,504 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3625986.6666666665, ans=0.0 2023-11-28 18:49:17,304 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.36 vs. limit=22.5 2023-11-28 18:49:25,752 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3626053.3333333335, ans=0.125 2023-11-28 18:49:31,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3626120.0, ans=0.0 2023-11-28 18:49:38,281 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 18:49:41,578 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 2850, loss[loss=0.05482, simple_loss=0.07942, pruned_loss=0.007055, audio_tagging_loss=0.008051, over 16483.00 frames. ], tot_loss[loss=0.06404, simple_loss=0.0871, pruned_loss=0.01181, audio_tagging_loss=0.008683, over 3053256.29 frames. ], batch size: 61, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 18:49:49,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3626186.6666666665, ans=0.0 2023-11-28 18:49:52,498 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3626186.6666666665, ans=0.2 2023-11-28 18:50:00,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=3626253.3333333335, ans=0.05 2023-11-28 18:50:06,842 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 543950 2023-11-28 18:50:24,405 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.53 vs. limit=22.5 2023-11-28 18:50:43,946 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 2900, loss[loss=0.06165, simple_loss=0.0822, pruned_loss=0.0114, audio_tagging_loss=0.009142, over 14551.00 frames. ], tot_loss[loss=0.06364, simple_loss=0.08658, pruned_loss=0.01168, audio_tagging_loss=0.008672, over 3047597.59 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:50:55,074 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.774e+01 8.790e+01 9.510e+01 1.033e+02 1.199e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-28 18:50:58,927 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3626586.6666666665, ans=0.125 2023-11-28 18:51:01,773 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.08 vs. limit=15.0 2023-11-28 18:51:08,904 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 544000 2023-11-28 18:51:15,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3626653.3333333335, ans=0.125 2023-11-28 18:51:30,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3626720.0, ans=0.125 2023-11-28 18:51:30,835 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.36 vs. limit=12.0 2023-11-28 18:51:46,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3626786.6666666665, ans=0.125 2023-11-28 18:51:46,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3626786.6666666665, ans=0.125 2023-11-28 18:51:48,524 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 2950, loss[loss=0.06894, simple_loss=0.09766, pruned_loss=0.01212, audio_tagging_loss=0.007985, over 13991.00 frames. ], tot_loss[loss=0.06429, simple_loss=0.08783, pruned_loss=0.01178, audio_tagging_loss=0.008601, over 3048262.77 frames. ], batch size: 53, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:52:04,967 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.63 vs. limit=6.0 2023-11-28 18:52:08,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3626920.0, ans=0.0 2023-11-28 18:52:13,407 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 544050 2023-11-28 18:52:28,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3627053.3333333335, ans=0.04949747468305833 2023-11-28 18:52:29,948 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3627053.3333333335, ans=0.125 2023-11-28 18:52:31,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3627053.3333333335, ans=0.0 2023-11-28 18:52:38,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3627120.0, ans=0.0 2023-11-28 18:52:50,278 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 3000, loss[loss=0.05863, simple_loss=0.07984, pruned_loss=0.00837, audio_tagging_loss=0.01034, over 15731.00 frames. ], tot_loss[loss=0.06443, simple_loss=0.08784, pruned_loss=0.01184, audio_tagging_loss=0.008672, over 3046406.34 frames. ], batch size: 62, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:52:50,279 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-28 18:53:33,227 INFO [train_asr.py:1267] (1/4) Epoch 46, validation: loss=0.05731, simple_loss=0.05055, pruned_loss=0.005328, audio_tagging_loss=0.02671, over 4681554.00 frames. 2023-11-28 18:53:33,228 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-28 18:53:44,185 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.352e+01 9.011e+01 9.606e+01 1.015e+02 1.587e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-28 18:53:46,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3627253.3333333335, ans=0.0 2023-11-28 18:53:50,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3627253.3333333335, ans=0.0 2023-11-28 18:53:57,563 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 544100 2023-11-28 18:53:59,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3627320.0, ans=0.125 2023-11-28 18:54:06,064 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.13 vs. limit=15.0 2023-11-28 18:54:19,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3627386.6666666665, ans=0.125 2023-11-28 18:54:34,914 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 3050, loss[loss=0.07253, simple_loss=0.09691, pruned_loss=0.01491, audio_tagging_loss=0.009164, over 16024.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.08899, pruned_loss=0.01208, audio_tagging_loss=0.008684, over 3054468.12 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:54:59,401 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 544150 2023-11-28 18:55:13,358 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 18:55:21,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3627720.0, ans=0.125 2023-11-28 18:55:25,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3627786.6666666665, ans=0.125 2023-11-28 18:55:37,561 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 3100, loss[loss=0.08358, simple_loss=0.1102, pruned_loss=0.01886, audio_tagging_loss=0.009604, over 15589.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08901, pruned_loss=0.0122, audio_tagging_loss=0.008692, over 3052337.04 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:55:48,881 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.644e+01 9.039e+01 9.695e+01 1.074e+02 1.445e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-28 18:55:56,693 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3627920.0, ans=0.125 2023-11-28 18:55:59,760 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3627920.0, ans=0.1 2023-11-28 18:56:03,297 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 544200 2023-11-28 18:56:03,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3627986.6666666665, ans=0.0 2023-11-28 18:56:24,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3628053.3333333335, ans=0.125 2023-11-28 18:56:39,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3628186.6666666665, ans=0.125 2023-11-28 18:56:39,975 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 3150, loss[loss=0.0611, simple_loss=0.08504, pruned_loss=0.009739, audio_tagging_loss=0.00884, over 15231.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.08968, pruned_loss=0.01233, audio_tagging_loss=0.008684, over 3052537.54 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:56:48,628 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3628186.6666666665, ans=0.125 2023-11-28 18:56:58,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3628253.3333333335, ans=0.2 2023-11-28 18:57:01,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3628253.3333333335, ans=0.04949747468305833 2023-11-28 18:57:05,161 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 544250 2023-11-28 18:57:27,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3628386.6666666665, ans=0.0 2023-11-28 18:57:28,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3628453.3333333335, ans=0.2 2023-11-28 18:57:42,626 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 3200, loss[loss=0.06146, simple_loss=0.07819, pruned_loss=0.01583, audio_tagging_loss=0.006527, over 14471.00 frames. ], tot_loss[loss=0.06681, simple_loss=0.09097, pruned_loss=0.01257, audio_tagging_loss=0.008754, over 3050827.20 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 18:57:45,546 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.55 vs. limit=15.0 2023-11-28 18:57:46,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3628520.0, ans=0.2 2023-11-28 18:57:52,882 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.859e+01 9.188e+01 9.825e+01 1.034e+02 1.228e+02, threshold=1.965e+02, percent-clipped=0.0 2023-11-28 18:57:57,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3628586.6666666665, ans=0.2 2023-11-28 18:58:00,197 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.48 vs. limit=15.0 2023-11-28 18:58:05,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3628653.3333333335, ans=0.125 2023-11-28 18:58:07,037 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 544300 2023-11-28 18:58:12,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3628653.3333333335, ans=0.125 2023-11-28 18:58:44,579 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 3250, loss[loss=0.05239, simple_loss=0.06127, pruned_loss=0.01253, audio_tagging_loss=0.00922, over 15037.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.09057, pruned_loss=0.01255, audio_tagging_loss=0.008779, over 3056139.49 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 18:58:44,950 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 18:58:47,161 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3628853.3333333335, ans=0.0 2023-11-28 18:58:52,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=3628853.3333333335, ans=22.5 2023-11-28 18:59:00,036 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.23 vs. limit=10.0 2023-11-28 18:59:01,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3628920.0, ans=0.1 2023-11-28 18:59:09,842 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 544350 2023-11-28 18:59:10,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3628986.6666666665, ans=0.125 2023-11-28 18:59:36,758 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.08 vs. limit=15.0 2023-11-28 18:59:46,032 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 3300, loss[loss=0.07643, simple_loss=0.1028, pruned_loss=0.01575, audio_tagging_loss=0.009255, over 15421.00 frames. ], tot_loss[loss=0.06703, simple_loss=0.09113, pruned_loss=0.01261, audio_tagging_loss=0.00886, over 3056664.96 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 18:59:49,727 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3629186.6666666665, ans=0.125 2023-11-28 18:59:57,859 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.822e+01 9.009e+01 9.919e+01 1.085e+02 1.499e+02, threshold=1.984e+02, percent-clipped=0.0 2023-11-28 18:59:59,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3629253.3333333335, ans=0.125 2023-11-28 19:00:01,726 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3629253.3333333335, ans=0.04949747468305833 2023-11-28 19:00:07,598 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3629253.3333333335, ans=0.2 2023-11-28 19:00:10,711 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 544400 2023-11-28 19:00:32,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3629386.6666666665, ans=0.0 2023-11-28 19:00:43,379 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3629453.3333333335, ans=0.125 2023-11-28 19:00:48,546 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 3350, loss[loss=0.03877, simple_loss=0.05203, pruned_loss=0.004359, audio_tagging_loss=0.0084, over 15749.00 frames. ], tot_loss[loss=0.06697, simple_loss=0.09116, pruned_loss=0.01261, audio_tagging_loss=0.008791, over 3055914.96 frames. ], batch size: 61, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:01:12,732 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 544450 2023-11-28 19:01:21,726 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3629653.3333333335, ans=0.125 2023-11-28 19:01:31,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3629720.0, ans=0.125 2023-11-28 19:01:49,498 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 3400, loss[loss=0.07881, simple_loss=0.1153, pruned_loss=0.01498, audio_tagging_loss=0.006209, over 15883.00 frames. ], tot_loss[loss=0.06689, simple_loss=0.09117, pruned_loss=0.01264, audio_tagging_loss=0.008666, over 3053838.51 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:01:50,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3629853.3333333335, ans=0.0 2023-11-28 19:01:52,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3629853.3333333335, ans=0.125 2023-11-28 19:02:01,878 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.836e+01 9.096e+01 9.800e+01 1.047e+02 1.329e+02, threshold=1.960e+02, percent-clipped=0.0 2023-11-28 19:02:04,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3629920.0, ans=0.1 2023-11-28 19:02:14,090 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 544500 2023-11-28 19:02:26,758 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.38 vs. limit=15.0 2023-11-28 19:02:48,335 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.49 vs. limit=22.5 2023-11-28 19:02:49,017 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3630120.0, ans=0.125 2023-11-28 19:02:51,182 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 3450, loss[loss=0.052, simple_loss=0.07171, pruned_loss=0.006299, audio_tagging_loss=0.009839, over 14811.00 frames. ], tot_loss[loss=0.06663, simple_loss=0.09133, pruned_loss=0.01241, audio_tagging_loss=0.008559, over 3051833.65 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:02:52,041 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.38 vs. limit=15.0 2023-11-28 19:02:58,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3630186.6666666665, ans=0.2 2023-11-28 19:03:16,979 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 544550 2023-11-28 19:03:37,957 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.10 vs. limit=15.0 2023-11-28 19:03:44,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3630453.3333333335, ans=0.1 2023-11-28 19:03:50,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3630453.3333333335, ans=0.0 2023-11-28 19:03:53,805 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 3500, loss[loss=0.05625, simple_loss=0.06244, pruned_loss=0.01626, audio_tagging_loss=0.008774, over 16395.00 frames. ], tot_loss[loss=0.0663, simple_loss=0.0909, pruned_loss=0.01237, audio_tagging_loss=0.008478, over 3056905.21 frames. ], batch size: 64, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:03:58,930 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3630520.0, ans=0.0 2023-11-28 19:04:05,399 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3630586.6666666665, ans=0.2 2023-11-28 19:04:06,272 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.451e+01 8.809e+01 9.584e+01 1.024e+02 1.310e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-28 19:04:10,196 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3630586.6666666665, ans=0.125 2023-11-28 19:04:11,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3630586.6666666665, ans=0.125 2023-11-28 19:04:13,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3630586.6666666665, ans=0.1 2023-11-28 19:04:18,821 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 544600 2023-11-28 19:04:27,713 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 19:04:39,449 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.64 vs. limit=15.0 2023-11-28 19:04:49,633 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.46 vs. limit=15.0 2023-11-28 19:04:56,014 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 3550, loss[loss=0.0728, simple_loss=0.1066, pruned_loss=0.01183, audio_tagging_loss=0.007651, over 15253.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.0905, pruned_loss=0.0123, audio_tagging_loss=0.008508, over 3052392.19 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 8.0 2023-11-28 19:05:16,661 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.65 vs. limit=10.0 2023-11-28 19:05:21,020 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 544650 2023-11-28 19:05:36,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3631053.3333333335, ans=0.0 2023-11-28 19:05:41,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3631053.3333333335, ans=0.125 2023-11-28 19:05:58,342 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 3600, loss[loss=0.0549, simple_loss=0.06344, pruned_loss=0.01169, audio_tagging_loss=0.01149, over 14580.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.08968, pruned_loss=0.01224, audio_tagging_loss=0.008528, over 3046751.45 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:06:06,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3631186.6666666665, ans=0.125 2023-11-28 19:06:12,392 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.114e+01 8.905e+01 9.661e+01 1.038e+02 1.227e+02, threshold=1.932e+02, percent-clipped=0.0 2023-11-28 19:06:13,885 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3631253.3333333335, ans=0.1 2023-11-28 19:06:23,105 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 544700 2023-11-28 19:07:00,627 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 3650, loss[loss=0.07418, simple_loss=0.09701, pruned_loss=0.01479, audio_tagging_loss=0.01089, over 15551.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.09002, pruned_loss=0.01229, audio_tagging_loss=0.00845, over 3046618.33 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:07:05,502 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 19:07:05,690 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.60 vs. limit=22.5 2023-11-28 19:07:08,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3631520.0, ans=0.125 2023-11-28 19:07:11,951 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.07 vs. limit=15.0 2023-11-28 19:07:25,326 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 544750 2023-11-28 19:07:34,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3631653.3333333335, ans=0.09899494936611666 2023-11-28 19:07:45,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3631720.0, ans=0.1 2023-11-28 19:07:53,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3631786.6666666665, ans=0.2 2023-11-28 19:08:01,846 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 3700, loss[loss=0.04781, simple_loss=0.05921, pruned_loss=0.006753, audio_tagging_loss=0.01145, over 17648.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.09034, pruned_loss=0.01232, audio_tagging_loss=0.008371, over 3049741.08 frames. ], batch size: 69, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:08:06,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3631853.3333333335, ans=0.125 2023-11-28 19:08:10,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3631853.3333333335, ans=0.125 2023-11-28 19:08:15,918 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.815e+01 9.135e+01 9.668e+01 1.042e+02 1.211e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-28 19:08:25,507 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.11 vs. limit=15.0 2023-11-28 19:08:27,290 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 544800 2023-11-28 19:08:28,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3631986.6666666665, ans=0.1 2023-11-28 19:08:31,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3631986.6666666665, ans=0.0 2023-11-28 19:08:47,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3632053.3333333335, ans=0.0 2023-11-28 19:09:05,307 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 3750, loss[loss=0.06919, simple_loss=0.09475, pruned_loss=0.01357, audio_tagging_loss=0.008245, over 16284.00 frames. ], tot_loss[loss=0.06595, simple_loss=0.09054, pruned_loss=0.01223, audio_tagging_loss=0.008454, over 3057455.36 frames. ], batch size: 61, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:09:14,351 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3632186.6666666665, ans=0.1 2023-11-28 19:09:19,165 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3632253.3333333335, ans=0.125 2023-11-28 19:09:22,212 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 19:09:30,378 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 544850 2023-11-28 19:09:31,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3632320.0, ans=0.2 2023-11-28 19:09:41,732 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3632386.6666666665, ans=0.125 2023-11-28 19:09:50,429 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 19:10:08,041 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 3800, loss[loss=0.07019, simple_loss=0.1053, pruned_loss=0.01, audio_tagging_loss=0.007533, over 14950.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08995, pruned_loss=0.01199, audio_tagging_loss=0.008525, over 3055833.74 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:10:20,985 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.825e+01 8.975e+01 9.556e+01 1.041e+02 1.200e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-28 19:10:28,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3632586.6666666665, ans=0.125 2023-11-28 19:10:32,372 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 544900 2023-11-28 19:10:53,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3632720.0, ans=0.125 2023-11-28 19:11:00,784 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3632786.6666666665, ans=0.2 2023-11-28 19:11:08,702 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 3850, loss[loss=0.06346, simple_loss=0.08897, pruned_loss=0.01164, audio_tagging_loss=0.007335, over 15463.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.09007, pruned_loss=0.01206, audio_tagging_loss=0.008606, over 3049803.79 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 8.0 2023-11-28 19:11:10,084 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3632853.3333333335, ans=0.125 2023-11-28 19:11:34,481 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 544950 2023-11-28 19:11:37,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3632986.6666666665, ans=0.125 2023-11-28 19:11:56,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3633053.3333333335, ans=0.125 2023-11-28 19:12:08,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3633120.0, ans=0.125 2023-11-28 19:12:08,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3633120.0, ans=0.0 2023-11-28 19:12:11,428 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 3900, loss[loss=0.0513, simple_loss=0.07552, pruned_loss=0.006857, audio_tagging_loss=0.006681, over 14728.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08909, pruned_loss=0.01203, audio_tagging_loss=0.008769, over 3042602.49 frames. ], batch size: 53, lr: 1.47e-03, grad_scale: 8.0 2023-11-28 19:12:21,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3633186.6666666665, ans=0.1 2023-11-28 19:12:26,150 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.516e+01 8.937e+01 9.555e+01 1.040e+02 1.282e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-28 19:12:36,348 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 545000 2023-11-28 19:13:03,503 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.88 vs. limit=6.0 2023-11-28 19:13:13,849 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 3950, loss[loss=0.05769, simple_loss=0.0781, pruned_loss=0.01008, audio_tagging_loss=0.008564, over 16045.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.08922, pruned_loss=0.012, audio_tagging_loss=0.008718, over 3040550.46 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 8.0 2023-11-28 19:13:15,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3633520.0, ans=0.2 2023-11-28 19:13:17,017 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3633520.0, ans=0.125 2023-11-28 19:13:26,961 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.39 vs. limit=12.0 2023-11-28 19:13:38,192 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 545050 2023-11-28 19:13:42,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3633653.3333333335, ans=0.2 2023-11-28 19:14:05,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3633786.6666666665, ans=0.125 2023-11-28 19:14:07,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3633786.6666666665, ans=0.025 2023-11-28 19:14:15,557 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 4000, loss[loss=0.04817, simple_loss=0.06107, pruned_loss=0.007093, audio_tagging_loss=0.01054, over 15860.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08925, pruned_loss=0.01214, audio_tagging_loss=0.00876, over 3041753.38 frames. ], batch size: 64, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:14:28,377 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3633920.0, ans=0.125 2023-11-28 19:14:30,284 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.394e+01 9.097e+01 9.894e+01 1.091e+02 1.423e+02, threshold=1.979e+02, percent-clipped=0.0 2023-11-28 19:14:31,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3633920.0, ans=0.125 2023-11-28 19:14:40,551 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 545100 2023-11-28 19:14:56,095 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.84 vs. limit=12.0 2023-11-28 19:15:08,092 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.12 vs. limit=15.0 2023-11-28 19:15:17,446 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 4050, loss[loss=0.08235, simple_loss=0.1198, pruned_loss=0.0148, audio_tagging_loss=0.007676, over 15306.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.08908, pruned_loss=0.01213, audio_tagging_loss=0.008883, over 3039098.27 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:15:22,311 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 19:15:43,041 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 545150 2023-11-28 19:15:52,483 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3634320.0, ans=0.0 2023-11-28 19:16:03,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3634386.6666666665, ans=0.2 2023-11-28 19:16:07,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3634453.3333333335, ans=0.04949747468305833 2023-11-28 19:16:13,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3634453.3333333335, ans=0.125 2023-11-28 19:16:14,664 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3634453.3333333335, ans=0.125 2023-11-28 19:16:19,699 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 4100, loss[loss=0.06291, simple_loss=0.08438, pruned_loss=0.01177, audio_tagging_loss=0.008949, over 14679.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.08958, pruned_loss=0.01215, audio_tagging_loss=0.008768, over 3040949.71 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:16:34,095 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.259e+01 9.021e+01 9.586e+01 1.040e+02 1.361e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-28 19:16:43,581 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 545200 2023-11-28 19:17:10,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3634786.6666666665, ans=0.125 2023-11-28 19:17:21,141 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 4150, loss[loss=0.05869, simple_loss=0.08049, pruned_loss=0.008526, audio_tagging_loss=0.00992, over 15037.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.08994, pruned_loss=0.01228, audio_tagging_loss=0.008683, over 3037764.89 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:17:37,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3634920.0, ans=0.125 2023-11-28 19:17:45,599 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 545250 2023-11-28 19:17:47,052 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3634986.6666666665, ans=0.1 2023-11-28 19:18:08,595 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 19:18:14,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3635120.0, ans=0.2 2023-11-28 19:18:16,301 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.08 vs. limit=15.0 2023-11-28 19:18:18,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3635120.0, ans=0.0 2023-11-28 19:18:19,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3635120.0, ans=0.125 2023-11-28 19:18:22,699 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 4200, loss[loss=0.05427, simple_loss=0.07411, pruned_loss=0.009486, audio_tagging_loss=0.007729, over 15776.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.09004, pruned_loss=0.01228, audio_tagging_loss=0.008559, over 3040868.60 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:18:29,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3635186.6666666665, ans=0.2 2023-11-28 19:18:37,259 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.060e+01 9.058e+01 9.549e+01 9.941e+01 2.004e+02, threshold=1.910e+02, percent-clipped=1.0 2023-11-28 19:18:48,332 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 545300 2023-11-28 19:18:49,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3635320.0, ans=0.125 2023-11-28 19:19:20,365 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3635453.3333333335, ans=0.125 2023-11-28 19:19:25,334 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 4250, loss[loss=0.06976, simple_loss=0.09351, pruned_loss=0.01373, audio_tagging_loss=0.009278, over 16453.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08991, pruned_loss=0.01214, audio_tagging_loss=0.008603, over 3046211.41 frames. ], batch size: 61, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:19:51,020 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 545350 2023-11-28 19:20:07,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3635720.0, ans=0.125 2023-11-28 19:20:20,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3635786.6666666665, ans=0.0 2023-11-28 19:20:28,742 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 4300, loss[loss=0.06376, simple_loss=0.0948, pruned_loss=0.01034, audio_tagging_loss=0.006013, over 16959.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.09031, pruned_loss=0.01231, audio_tagging_loss=0.008512, over 3049654.68 frames. ], batch size: 64, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:20:32,522 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3635853.3333333335, ans=0.0 2023-11-28 19:20:42,802 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.171e+01 9.029e+01 9.603e+01 1.044e+02 1.295e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-28 19:20:52,788 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 545400 2023-11-28 19:20:56,952 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3635986.6666666665, ans=0.0 2023-11-28 19:21:05,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3636053.3333333335, ans=0.2 2023-11-28 19:21:29,154 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 4350, loss[loss=0.06566, simple_loss=0.09219, pruned_loss=0.01254, audio_tagging_loss=0.007021, over 14929.00 frames. ], tot_loss[loss=0.06688, simple_loss=0.09183, pruned_loss=0.01257, audio_tagging_loss=0.008392, over 3043872.19 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:21:40,554 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3636253.3333333335, ans=0.125 2023-11-28 19:21:41,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3636253.3333333335, ans=0.125 2023-11-28 19:21:50,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3636253.3333333335, ans=0.2 2023-11-28 19:21:54,064 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 545450 2023-11-28 19:22:04,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3636320.0, ans=0.125 2023-11-28 19:22:07,674 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.15 vs. limit=22.5 2023-11-28 19:22:09,778 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.56 vs. limit=15.0 2023-11-28 19:22:13,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3636386.6666666665, ans=10.0 2023-11-28 19:22:20,782 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3636453.3333333335, ans=0.125 2023-11-28 19:22:31,076 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 4400, loss[loss=0.06679, simple_loss=0.08218, pruned_loss=0.01479, audio_tagging_loss=0.01091, over 13963.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.09088, pruned_loss=0.01242, audio_tagging_loss=0.008415, over 3039174.84 frames. ], batch size: 53, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 19:22:38,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3636520.0, ans=0.1 2023-11-28 19:22:46,745 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.356e+01 9.163e+01 9.666e+01 1.055e+02 1.360e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-28 19:22:49,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3636586.6666666665, ans=0.125 2023-11-28 19:22:56,139 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 545500 2023-11-28 19:23:02,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3636653.3333333335, ans=0.2 2023-11-28 19:23:07,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3636720.0, ans=0.125 2023-11-28 19:23:17,440 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.44 vs. limit=10.0 2023-11-28 19:23:23,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3636786.6666666665, ans=0.125 2023-11-28 19:23:33,507 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 4450, loss[loss=0.05637, simple_loss=0.06935, pruned_loss=0.01179, audio_tagging_loss=0.009907, over 14456.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.09037, pruned_loss=0.01224, audio_tagging_loss=0.008365, over 3044985.58 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 19:23:52,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3636920.0, ans=0.125 2023-11-28 19:23:56,921 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3636920.0, ans=0.2 2023-11-28 19:23:58,963 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 545550 2023-11-28 19:24:35,317 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.95 vs. limit=15.0 2023-11-28 19:24:35,799 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 4500, loss[loss=0.07858, simple_loss=0.1089, pruned_loss=0.01571, audio_tagging_loss=0.008411, over 15703.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.09006, pruned_loss=0.01223, audio_tagging_loss=0.008387, over 3050131.10 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 19:24:36,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3637186.6666666665, ans=0.125 2023-11-28 19:24:40,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten.whitening_limit, batch_count=3637186.6666666665, ans=22.5 2023-11-28 19:24:48,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3637253.3333333335, ans=0.0 2023-11-28 19:24:50,613 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.348e+01 8.752e+01 9.380e+01 1.023e+02 1.206e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-28 19:25:00,973 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 545600 2023-11-28 19:25:08,278 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.32 vs. limit=15.0 2023-11-28 19:25:09,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3637320.0, ans=0.0 2023-11-28 19:25:18,527 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.84 vs. limit=22.5 2023-11-28 19:25:35,021 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3637453.3333333335, ans=0.125 2023-11-28 19:25:38,425 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 4550, loss[loss=0.05967, simple_loss=0.07783, pruned_loss=0.01222, audio_tagging_loss=0.008534, over 14513.00 frames. ], tot_loss[loss=0.06505, simple_loss=0.08907, pruned_loss=0.01209, audio_tagging_loss=0.008421, over 3043000.75 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 19:25:55,517 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3637586.6666666665, ans=0.0 2023-11-28 19:25:57,658 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3637586.6666666665, ans=0.125 2023-11-28 19:25:59,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3637586.6666666665, ans=0.1 2023-11-28 19:26:03,965 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 545650 2023-11-28 19:26:13,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3637653.3333333335, ans=0.125 2023-11-28 19:26:28,094 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 19:26:40,930 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 4600, loss[loss=0.08337, simple_loss=0.1129, pruned_loss=0.018, audio_tagging_loss=0.008919, over 16186.00 frames. ], tot_loss[loss=0.06493, simple_loss=0.08863, pruned_loss=0.01203, audio_tagging_loss=0.008586, over 3040722.82 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 19:26:43,541 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3637853.3333333335, ans=0.125 2023-11-28 19:26:46,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3637853.3333333335, ans=0.125 2023-11-28 19:26:52,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3637920.0, ans=0.125 2023-11-28 19:26:55,385 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.046e+01 8.776e+01 9.447e+01 1.031e+02 1.407e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-28 19:27:05,261 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 545700 2023-11-28 19:27:12,629 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3637986.6666666665, ans=0.2 2023-11-28 19:27:33,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3638120.0, ans=0.0 2023-11-28 19:27:34,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3638120.0, ans=0.125 2023-11-28 19:27:42,016 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 4650, loss[loss=0.06637, simple_loss=0.0934, pruned_loss=0.01176, audio_tagging_loss=0.007903, over 14043.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.08865, pruned_loss=0.01204, audio_tagging_loss=0.008759, over 3034971.16 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:27:44,809 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3638186.6666666665, ans=0.125 2023-11-28 19:27:49,631 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.02 vs. limit=15.0 2023-11-28 19:27:57,863 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3638253.3333333335, ans=0.2 2023-11-28 19:28:06,585 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 545750 2023-11-28 19:28:18,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3638386.6666666665, ans=0.1 2023-11-28 19:28:35,476 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3638453.3333333335, ans=0.125 2023-11-28 19:28:43,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3638520.0, ans=0.2 2023-11-28 19:28:44,203 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 4700, loss[loss=0.06142, simple_loss=0.08375, pruned_loss=0.01068, audio_tagging_loss=0.008864, over 16377.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.0895, pruned_loss=0.01239, audio_tagging_loss=0.008775, over 3036334.63 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:28:58,643 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.17 vs. limit=15.0 2023-11-28 19:29:00,545 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.526e+01 9.174e+01 9.774e+01 1.029e+02 1.399e+02, threshold=1.955e+02, percent-clipped=0.0 2023-11-28 19:29:09,124 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 545800 2023-11-28 19:29:21,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3638720.0, ans=0.125 2023-11-28 19:29:47,390 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 4750, loss[loss=0.07354, simple_loss=0.09503, pruned_loss=0.01712, audio_tagging_loss=0.008907, over 15760.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08835, pruned_loss=0.01226, audio_tagging_loss=0.008866, over 3035848.62 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:29:49,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3638853.3333333335, ans=0.125 2023-11-28 19:29:57,559 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.29 vs. limit=15.0 2023-11-28 19:30:08,499 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3638920.0, ans=0.125 2023-11-28 19:30:11,908 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 545850 2023-11-28 19:30:41,073 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3639120.0, ans=0.125 2023-11-28 19:30:48,681 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 4800, loss[loss=0.06511, simple_loss=0.08365, pruned_loss=0.01129, audio_tagging_loss=0.01199, over 14877.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.08892, pruned_loss=0.01229, audio_tagging_loss=0.008925, over 3041652.34 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 19:30:51,379 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3639186.6666666665, ans=0.2 2023-11-28 19:31:05,146 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.705e+01 8.815e+01 9.365e+01 1.036e+02 1.386e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-28 19:31:10,685 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.88 vs. limit=10.0 2023-11-28 19:31:14,172 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 545900 2023-11-28 19:31:17,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff2.min_abs, batch_count=3639320.0, ans=0.1 2023-11-28 19:31:33,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3639386.6666666665, ans=0.125 2023-11-28 19:31:38,316 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3639453.3333333335, ans=0.125 2023-11-28 19:31:47,882 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3639453.3333333335, ans=0.0 2023-11-28 19:31:51,020 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 4850, loss[loss=0.06756, simple_loss=0.09848, pruned_loss=0.009957, audio_tagging_loss=0.008363, over 15671.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.0883, pruned_loss=0.01211, audio_tagging_loss=0.009001, over 3036328.69 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 19:31:58,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3639520.0, ans=0.125 2023-11-28 19:32:01,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3639520.0, ans=0.05 2023-11-28 19:32:11,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3639586.6666666665, ans=0.1 2023-11-28 19:32:15,693 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 545950 2023-11-28 19:32:20,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3639653.3333333335, ans=0.125 2023-11-28 19:32:29,232 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.47 vs. limit=15.0 2023-11-28 19:32:30,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3639720.0, ans=0.0 2023-11-28 19:32:36,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3639720.0, ans=0.1 2023-11-28 19:32:45,401 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 19:32:52,929 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 4900, loss[loss=0.07755, simple_loss=0.1085, pruned_loss=0.01641, audio_tagging_loss=0.006903, over 14693.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.08909, pruned_loss=0.01209, audio_tagging_loss=0.008973, over 3034357.06 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:32:57,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3639853.3333333335, ans=0.0 2023-11-28 19:32:57,620 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.74 vs. limit=15.0 2023-11-28 19:33:00,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3639853.3333333335, ans=0.025 2023-11-28 19:33:10,141 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.219e+01 8.839e+01 9.451e+01 1.014e+02 1.484e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-28 19:33:17,179 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 546000 2023-11-28 19:33:40,579 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3640053.3333333335, ans=0.125 2023-11-28 19:33:41,927 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3640120.0, ans=0.125 2023-11-28 19:33:43,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3640120.0, ans=0.0 2023-11-28 19:33:45,907 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3640120.0, ans=0.2 2023-11-28 19:33:54,888 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 4950, loss[loss=0.06209, simple_loss=0.08944, pruned_loss=0.01065, audio_tagging_loss=0.006726, over 14794.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.08965, pruned_loss=0.01213, audio_tagging_loss=0.008769, over 3030366.93 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:33:55,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3640186.6666666665, ans=0.1 2023-11-28 19:34:01,047 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3640186.6666666665, ans=0.0 2023-11-28 19:34:13,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3640253.3333333335, ans=0.1 2023-11-28 19:34:15,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3640253.3333333335, ans=0.0 2023-11-28 19:34:19,512 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 546050 2023-11-28 19:34:20,931 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3640320.0, ans=0.0 2023-11-28 19:34:28,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3640320.0, ans=0.125 2023-11-28 19:34:37,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3640386.6666666665, ans=0.0 2023-11-28 19:34:37,411 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3640386.6666666665, ans=0.125 2023-11-28 19:34:51,679 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.74 vs. limit=15.0 2023-11-28 19:34:55,723 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 5000, loss[loss=0.08215, simple_loss=0.1191, pruned_loss=0.01482, audio_tagging_loss=0.007792, over 15476.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.08995, pruned_loss=0.01211, audio_tagging_loss=0.00865, over 3032025.82 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:35:06,648 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.99 vs. limit=12.0 2023-11-28 19:35:07,663 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.26 vs. limit=15.0 2023-11-28 19:35:13,459 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.449e+01 8.947e+01 9.568e+01 1.019e+02 1.168e+02, threshold=1.914e+02, percent-clipped=0.0 2023-11-28 19:35:21,235 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 546100 2023-11-28 19:35:31,913 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3640720.0, ans=0.125 2023-11-28 19:35:54,476 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.34 vs. limit=22.5 2023-11-28 19:35:58,105 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 5050, loss[loss=0.07897, simple_loss=0.1061, pruned_loss=0.01224, audio_tagging_loss=0.01367, over 15059.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08958, pruned_loss=0.01214, audio_tagging_loss=0.008666, over 3034315.10 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:36:00,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3640853.3333333335, ans=0.0 2023-11-28 19:36:01,868 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3640853.3333333335, ans=0.0 2023-11-28 19:36:09,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3640920.0, ans=0.125 2023-11-28 19:36:11,311 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.79 vs. limit=15.0 2023-11-28 19:36:14,631 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.62 vs. limit=22.5 2023-11-28 19:36:18,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3640920.0, ans=0.125 2023-11-28 19:36:22,353 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 546150 2023-11-28 19:36:26,260 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3640986.6666666665, ans=0.125 2023-11-28 19:36:44,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3641053.3333333335, ans=0.5 2023-11-28 19:36:55,199 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.62 vs. limit=10.0 2023-11-28 19:37:00,151 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 5100, loss[loss=0.06389, simple_loss=0.08264, pruned_loss=0.01369, audio_tagging_loss=0.008881, over 13852.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.08896, pruned_loss=0.01202, audio_tagging_loss=0.008663, over 3045389.62 frames. ], batch size: 53, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:37:07,823 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.46 vs. limit=10.0 2023-11-28 19:37:17,162 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.029e+01 8.885e+01 9.689e+01 1.021e+02 1.449e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-28 19:37:23,797 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 19:37:24,849 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 546200 2023-11-28 19:37:27,735 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3641320.0, ans=0.125 2023-11-28 19:37:36,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3641386.6666666665, ans=0.0 2023-11-28 19:37:41,640 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=3641386.6666666665, ans=10.0 2023-11-28 19:37:45,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3641386.6666666665, ans=0.0 2023-11-28 19:38:01,238 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 5150, loss[loss=0.05489, simple_loss=0.07551, pruned_loss=0.008535, audio_tagging_loss=0.008601, over 15539.00 frames. ], tot_loss[loss=0.06477, simple_loss=0.08839, pruned_loss=0.01189, audio_tagging_loss=0.008675, over 3036215.33 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:38:03,488 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3641520.0, ans=0.125 2023-11-28 19:38:09,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=3641520.0, ans=15.0 2023-11-28 19:38:27,211 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 546250 2023-11-28 19:38:34,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3641653.3333333335, ans=0.5 2023-11-28 19:38:39,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3641720.0, ans=0.125 2023-11-28 19:38:51,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3641786.6666666665, ans=0.2 2023-11-28 19:38:55,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3641786.6666666665, ans=0.125 2023-11-28 19:39:04,449 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 5200, loss[loss=0.04987, simple_loss=0.06268, pruned_loss=0.005538, audio_tagging_loss=0.013, over 13633.00 frames. ], tot_loss[loss=0.06496, simple_loss=0.08877, pruned_loss=0.01198, audio_tagging_loss=0.008595, over 3033181.88 frames. ], batch size: 53, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:39:22,730 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.764e+01 9.099e+01 9.660e+01 1.024e+02 1.324e+02, threshold=1.932e+02, percent-clipped=0.0 2023-11-28 19:39:24,726 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.91 vs. limit=15.0 2023-11-28 19:39:28,807 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 546300 2023-11-28 19:39:43,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3642053.3333333335, ans=0.0 2023-11-28 19:40:06,062 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 5250, loss[loss=0.0672, simple_loss=0.09617, pruned_loss=0.01232, audio_tagging_loss=0.006791, over 16759.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.08895, pruned_loss=0.01213, audio_tagging_loss=0.008512, over 3037135.90 frames. ], batch size: 64, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:40:11,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3642186.6666666665, ans=0.0 2023-11-28 19:40:22,885 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.89 vs. limit=15.0 2023-11-28 19:40:25,598 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3642253.3333333335, ans=0.0 2023-11-28 19:40:30,158 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 546350 2023-11-28 19:40:38,509 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3642320.0, ans=0.125 2023-11-28 19:40:44,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3642386.6666666665, ans=0.0 2023-11-28 19:40:51,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3642386.6666666665, ans=0.125 2023-11-28 19:40:53,066 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3642386.6666666665, ans=0.125 2023-11-28 19:40:54,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3642453.3333333335, ans=0.125 2023-11-28 19:40:59,341 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.19 vs. limit=12.0 2023-11-28 19:41:02,392 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 19:41:06,092 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3642520.0, ans=0.125 2023-11-28 19:41:06,888 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 5300, loss[loss=0.05142, simple_loss=0.07105, pruned_loss=0.006468, audio_tagging_loss=0.009429, over 15294.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08983, pruned_loss=0.01219, audio_tagging_loss=0.008474, over 3036992.84 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:41:25,546 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.603e+01 9.027e+01 9.738e+01 1.069e+02 1.273e+02, threshold=1.948e+02, percent-clipped=0.0 2023-11-28 19:41:31,986 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 546400 2023-11-28 19:41:34,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3642653.3333333335, ans=0.125 2023-11-28 19:41:50,401 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3642720.0, ans=0.125 2023-11-28 19:41:53,969 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3642720.0, ans=0.125 2023-11-28 19:42:03,695 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3642786.6666666665, ans=0.125 2023-11-28 19:42:08,147 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 5350, loss[loss=0.06016, simple_loss=0.08178, pruned_loss=0.01025, audio_tagging_loss=0.009016, over 15037.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08986, pruned_loss=0.01218, audio_tagging_loss=0.008517, over 3035549.65 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:42:09,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3642853.3333333335, ans=0.125 2023-11-28 19:42:11,305 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 19:42:14,520 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3642853.3333333335, ans=0.0 2023-11-28 19:42:19,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3642853.3333333335, ans=10.0 2023-11-28 19:42:33,668 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 546450 2023-11-28 19:42:53,189 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 19:43:03,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3643120.0, ans=0.125 2023-11-28 19:43:10,560 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 5400, loss[loss=0.05972, simple_loss=0.07602, pruned_loss=0.01168, audio_tagging_loss=0.01003, over 14078.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08941, pruned_loss=0.01205, audio_tagging_loss=0.008598, over 3032237.67 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:43:18,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3643186.6666666665, ans=0.125 2023-11-28 19:43:22,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3643253.3333333335, ans=0.125 2023-11-28 19:43:28,026 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.538e+01 9.157e+01 9.833e+01 1.043e+02 1.444e+02, threshold=1.967e+02, percent-clipped=0.0 2023-11-28 19:43:34,965 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 546500 2023-11-28 19:43:41,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3643320.0, ans=0.0 2023-11-28 19:43:48,502 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.10 vs. limit=15.0 2023-11-28 19:43:49,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3643386.6666666665, ans=0.2 2023-11-28 19:43:58,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3643386.6666666665, ans=0.0 2023-11-28 19:44:00,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3643453.3333333335, ans=0.0 2023-11-28 19:44:12,623 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 5450, loss[loss=0.06626, simple_loss=0.09369, pruned_loss=0.01277, audio_tagging_loss=0.006642, over 14770.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.0895, pruned_loss=0.01218, audio_tagging_loss=0.008576, over 3029218.31 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:44:37,573 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 546550 2023-11-28 19:44:40,311 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.33 vs. limit=8.0 2023-11-28 19:44:55,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3643720.0, ans=0.0 2023-11-28 19:44:59,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3643720.0, ans=0.2 2023-11-28 19:45:06,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3643786.6666666665, ans=0.0 2023-11-28 19:45:10,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3643786.6666666665, ans=0.09899494936611666 2023-11-28 19:45:11,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3643786.6666666665, ans=0.1 2023-11-28 19:45:14,846 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 5500, loss[loss=0.07229, simple_loss=0.09446, pruned_loss=0.01263, audio_tagging_loss=0.01243, over 15591.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08992, pruned_loss=0.01222, audio_tagging_loss=0.008588, over 3025942.56 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:45:15,088 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3643853.3333333335, ans=0.125 2023-11-28 19:45:25,271 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3643853.3333333335, ans=0.0 2023-11-28 19:45:34,331 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.526e+01 8.789e+01 9.570e+01 1.015e+02 1.276e+02, threshold=1.914e+02, percent-clipped=0.0 2023-11-28 19:45:38,499 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.22 vs. limit=15.0 2023-11-28 19:45:39,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3643986.6666666665, ans=0.1 2023-11-28 19:45:40,495 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 546600 2023-11-28 19:45:46,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3643986.6666666665, ans=0.2 2023-11-28 19:45:54,332 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.75 vs. limit=15.0 2023-11-28 19:46:11,487 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 19:46:17,802 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 5550, loss[loss=0.05859, simple_loss=0.08443, pruned_loss=0.006652, audio_tagging_loss=0.009718, over 15904.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08942, pruned_loss=0.01204, audio_tagging_loss=0.008698, over 3023855.55 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:46:27,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=3644186.6666666665, ans=22.5 2023-11-28 19:46:29,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3644253.3333333335, ans=0.125 2023-11-28 19:46:41,408 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 546650 2023-11-28 19:46:50,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3644320.0, ans=0.1 2023-11-28 19:46:57,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3644386.6666666665, ans=0.0 2023-11-28 19:47:03,149 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 19:47:03,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3644386.6666666665, ans=0.1 2023-11-28 19:47:08,485 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3644453.3333333335, ans=0.2 2023-11-28 19:47:18,554 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 5600, loss[loss=0.07287, simple_loss=0.0913, pruned_loss=0.01692, audio_tagging_loss=0.0103, over 15585.00 frames. ], tot_loss[loss=0.06595, simple_loss=0.09027, pruned_loss=0.01211, audio_tagging_loss=0.008705, over 3032839.49 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 19:47:29,137 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.69 vs. limit=15.0 2023-11-28 19:47:31,159 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 19:47:38,134 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.490e+01 8.943e+01 9.603e+01 1.015e+02 1.818e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-28 19:47:43,594 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 546700 2023-11-28 19:47:54,538 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3644653.3333333335, ans=0.125 2023-11-28 19:48:01,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=3644720.0, ans=22.5 2023-11-28 19:48:04,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3644720.0, ans=0.125 2023-11-28 19:48:05,773 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 19:48:07,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3644786.6666666665, ans=0.125 2023-11-28 19:48:07,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3644786.6666666665, ans=0.0 2023-11-28 19:48:14,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3644786.6666666665, ans=0.125 2023-11-28 19:48:20,619 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 5650, loss[loss=0.08107, simple_loss=0.1126, pruned_loss=0.0178, audio_tagging_loss=0.006967, over 15383.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.09032, pruned_loss=0.01226, audio_tagging_loss=0.008814, over 3038133.00 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:48:22,394 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.55 vs. limit=6.0 2023-11-28 19:48:32,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3644920.0, ans=0.1 2023-11-28 19:48:45,945 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 546750 2023-11-28 19:48:53,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3644986.6666666665, ans=0.125 2023-11-28 19:49:05,201 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3645053.3333333335, ans=0.125 2023-11-28 19:49:11,245 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.42 vs. limit=12.0 2023-11-28 19:49:21,870 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 5700, loss[loss=0.05853, simple_loss=0.0738, pruned_loss=0.01019, audio_tagging_loss=0.01144, over 15016.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08948, pruned_loss=0.01216, audio_tagging_loss=0.008869, over 3039241.96 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:49:32,745 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3645186.6666666665, ans=0.125 2023-11-28 19:49:41,947 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.069e+01 8.661e+01 9.434e+01 1.006e+02 1.407e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-28 19:49:45,784 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3645320.0, ans=0.125 2023-11-28 19:49:46,731 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 546800 2023-11-28 19:50:13,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3645453.3333333335, ans=0.125 2023-11-28 19:50:21,709 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.12 vs. limit=15.0 2023-11-28 19:50:24,559 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 5750, loss[loss=0.07124, simple_loss=0.0871, pruned_loss=0.01709, audio_tagging_loss=0.0106, over 15671.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.08973, pruned_loss=0.01224, audio_tagging_loss=0.008794, over 3042951.83 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:50:28,874 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.22 vs. limit=15.0 2023-11-28 19:50:49,884 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 546850 2023-11-28 19:50:54,888 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3645653.3333333335, ans=0.2 2023-11-28 19:50:55,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3645653.3333333335, ans=0.125 2023-11-28 19:51:07,653 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.11 vs. limit=10.0 2023-11-28 19:51:23,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3645786.6666666665, ans=0.125 2023-11-28 19:51:26,956 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 5800, loss[loss=0.05208, simple_loss=0.07113, pruned_loss=0.008546, audio_tagging_loss=0.007968, over 16578.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.09041, pruned_loss=0.01231, audio_tagging_loss=0.008623, over 3042293.84 frames. ], batch size: 63, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:51:35,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3645853.3333333335, ans=0.1 2023-11-28 19:51:46,761 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.275e+01 8.935e+01 9.736e+01 1.038e+02 1.216e+02, threshold=1.947e+02, percent-clipped=0.0 2023-11-28 19:51:49,376 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3645920.0, ans=0.125 2023-11-28 19:51:51,505 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 546900 2023-11-28 19:51:53,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3645986.6666666665, ans=0.0 2023-11-28 19:52:14,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3646053.3333333335, ans=0.0 2023-11-28 19:52:22,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3646120.0, ans=0.0 2023-11-28 19:52:28,735 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 5850, loss[loss=0.0601, simple_loss=0.08012, pruned_loss=0.01021, audio_tagging_loss=0.009833, over 16005.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.09017, pruned_loss=0.01216, audio_tagging_loss=0.008555, over 3039173.44 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:52:29,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3646186.6666666665, ans=0.125 2023-11-28 19:52:40,398 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3646253.3333333335, ans=0.125 2023-11-28 19:52:50,738 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.27 vs. limit=15.0 2023-11-28 19:52:53,903 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 546950 2023-11-28 19:52:54,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3646320.0, ans=0.125 2023-11-28 19:52:57,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3646320.0, ans=0.0 2023-11-28 19:53:00,675 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.94 vs. limit=6.0 2023-11-28 19:53:09,807 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.33 vs. limit=10.0 2023-11-28 19:53:23,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3646453.3333333335, ans=0.125 2023-11-28 19:53:30,693 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 5900, loss[loss=0.05934, simple_loss=0.07442, pruned_loss=0.01156, audio_tagging_loss=0.01057, over 15042.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.09056, pruned_loss=0.01209, audio_tagging_loss=0.008482, over 3038309.20 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:53:32,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3646520.0, ans=0.125 2023-11-28 19:53:41,484 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.94 vs. limit=15.0 2023-11-28 19:53:50,825 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.425e+01 9.034e+01 9.568e+01 1.043e+02 1.665e+02, threshold=1.914e+02, percent-clipped=0.0 2023-11-28 19:53:55,758 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 547000 2023-11-28 19:53:55,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3646653.3333333335, ans=0.1 2023-11-28 19:54:01,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3646653.3333333335, ans=0.0 2023-11-28 19:54:05,267 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.54 vs. limit=22.5 2023-11-28 19:54:06,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3646653.3333333335, ans=0.125 2023-11-28 19:54:20,809 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3646786.6666666665, ans=10.0 2023-11-28 19:54:33,338 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 5950, loss[loss=0.07481, simple_loss=0.1022, pruned_loss=0.01509, audio_tagging_loss=0.008599, over 14406.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.09046, pruned_loss=0.01208, audio_tagging_loss=0.008553, over 3048929.60 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:54:36,467 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=3646853.3333333335, ans=22.5 2023-11-28 19:54:41,080 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.82 vs. limit=15.0 2023-11-28 19:54:47,218 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3646920.0, ans=0.125 2023-11-28 19:54:58,255 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 547050 2023-11-28 19:54:58,962 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.80 vs. limit=15.0 2023-11-28 19:55:09,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3647053.3333333335, ans=0.125 2023-11-28 19:55:23,434 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.57 vs. limit=15.0 2023-11-28 19:55:35,013 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 6000, loss[loss=0.07674, simple_loss=0.1146, pruned_loss=0.0117, audio_tagging_loss=0.007758, over 15234.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.0905, pruned_loss=0.01203, audio_tagging_loss=0.008547, over 3044123.17 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 19:55:35,014 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-28 19:56:11,976 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.1475, 3.9867, 3.7688, 3.2968], device='cuda:1') 2023-11-28 19:56:14,879 INFO [train_asr.py:1267] (1/4) Epoch 46, validation: loss=0.05742, simple_loss=0.05049, pruned_loss=0.005198, audio_tagging_loss=0.02698, over 4681554.00 frames. 2023-11-28 19:56:14,879 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-28 19:56:34,503 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.009e+01 8.785e+01 9.507e+01 1.045e+02 2.026e+02, threshold=1.901e+02, percent-clipped=1.0 2023-11-28 19:56:35,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3647253.3333333335, ans=0.0 2023-11-28 19:56:39,393 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 547100 2023-11-28 19:56:50,228 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3647320.0, ans=0.0 2023-11-28 19:56:52,763 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.79 vs. limit=22.5 2023-11-28 19:57:01,547 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 19:57:15,018 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.33 vs. limit=22.5 2023-11-28 19:57:16,630 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 6050, loss[loss=0.06136, simple_loss=0.09298, pruned_loss=0.007977, audio_tagging_loss=0.006895, over 16294.00 frames. ], tot_loss[loss=0.06482, simple_loss=0.08884, pruned_loss=0.01181, audio_tagging_loss=0.008594, over 3038911.38 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:57:16,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3647520.0, ans=0.125 2023-11-28 19:57:38,393 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.08 vs. limit=6.0 2023-11-28 19:57:41,476 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 547150 2023-11-28 19:57:41,581 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3647653.3333333335, ans=0.125 2023-11-28 19:57:46,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3647653.3333333335, ans=0.0 2023-11-28 19:57:57,719 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3647720.0, ans=0.2 2023-11-28 19:58:12,201 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3647786.6666666665, ans=0.0 2023-11-28 19:58:12,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3647786.6666666665, ans=0.2 2023-11-28 19:58:18,293 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 6100, loss[loss=0.0636, simple_loss=0.08826, pruned_loss=0.01068, audio_tagging_loss=0.00879, over 15468.00 frames. ], tot_loss[loss=0.06482, simple_loss=0.08874, pruned_loss=0.01186, audio_tagging_loss=0.008594, over 3044414.87 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:58:26,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3647853.3333333335, ans=0.125 2023-11-28 19:58:26,472 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.71 vs. limit=15.0 2023-11-28 19:58:32,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3647920.0, ans=0.125 2023-11-28 19:58:37,197 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.54 vs. limit=15.0 2023-11-28 19:58:39,060 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.459e+01 8.974e+01 9.540e+01 1.034e+02 1.321e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-28 19:58:40,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3647920.0, ans=0.125 2023-11-28 19:58:41,768 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3647986.6666666665, ans=0.125 2023-11-28 19:58:42,885 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 547200 2023-11-28 19:58:44,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3647986.6666666665, ans=0.2 2023-11-28 19:58:54,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3647986.6666666665, ans=0.0 2023-11-28 19:59:15,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3648120.0, ans=0.0 2023-11-28 19:59:20,201 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 19:59:20,237 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3648186.6666666665, ans=0.125 2023-11-28 19:59:20,958 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 6150, loss[loss=0.08914, simple_loss=0.1266, pruned_loss=0.01782, audio_tagging_loss=0.00801, over 15600.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.08997, pruned_loss=0.01209, audio_tagging_loss=0.008661, over 3046263.38 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:59:46,024 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 547250 2023-11-28 19:59:51,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3648320.0, ans=0.125 2023-11-28 20:00:01,560 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.26 vs. limit=15.0 2023-11-28 20:00:05,905 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 20:00:09,498 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3648453.3333333335, ans=0.0 2023-11-28 20:00:10,947 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.51 vs. limit=6.0 2023-11-28 20:00:21,952 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 6200, loss[loss=0.06967, simple_loss=0.08945, pruned_loss=0.01689, audio_tagging_loss=0.008055, over 13804.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08891, pruned_loss=0.01207, audio_tagging_loss=0.008751, over 3042784.94 frames. ], batch size: 52, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:00:31,103 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 20:00:33,536 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3648586.6666666665, ans=0.125 2023-11-28 20:00:42,980 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.460e+01 8.739e+01 9.716e+01 1.027e+02 1.338e+02, threshold=1.943e+02, percent-clipped=0.0 2023-11-28 20:00:44,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3648586.6666666665, ans=0.125 2023-11-28 20:00:47,130 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 547300 2023-11-28 20:00:47,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3648653.3333333335, ans=0.125 2023-11-28 20:01:23,808 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 6250, loss[loss=0.06081, simple_loss=0.08208, pruned_loss=0.009007, audio_tagging_loss=0.01076, over 14586.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.09005, pruned_loss=0.0123, audio_tagging_loss=0.008752, over 3050137.62 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:01:27,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3648853.3333333335, ans=0.125 2023-11-28 20:01:38,996 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.38 vs. limit=22.5 2023-11-28 20:01:42,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3648920.0, ans=0.1 2023-11-28 20:01:46,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3648986.6666666665, ans=0.125 2023-11-28 20:01:47,703 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 547350 2023-11-28 20:01:47,893 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3648986.6666666665, ans=0.0 2023-11-28 20:02:00,112 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.06 vs. limit=15.0 2023-11-28 20:02:25,163 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 6300, loss[loss=0.07595, simple_loss=0.1049, pruned_loss=0.01225, audio_tagging_loss=0.01127, over 14706.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.09059, pruned_loss=0.01233, audio_tagging_loss=0.00875, over 3052431.57 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:02:27,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3649186.6666666665, ans=0.2 2023-11-28 20:02:45,245 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.610e+01 8.912e+01 9.440e+01 1.014e+02 1.345e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-28 20:02:49,554 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 547400 2023-11-28 20:03:12,422 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3649386.6666666665, ans=0.09899494936611666 2023-11-28 20:03:15,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3649453.3333333335, ans=0.0 2023-11-28 20:03:26,072 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 6350, loss[loss=0.05283, simple_loss=0.06753, pruned_loss=0.008209, audio_tagging_loss=0.01086, over 15373.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.08972, pruned_loss=0.01216, audio_tagging_loss=0.008918, over 3049051.97 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:03:26,765 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.36 vs. limit=15.0 2023-11-28 20:03:51,501 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 547450 2023-11-28 20:03:54,704 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3649653.3333333335, ans=0.125 2023-11-28 20:04:18,467 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3649786.6666666665, ans=0.125 2023-11-28 20:04:21,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=3649786.6666666665, ans=15.0 2023-11-28 20:04:28,212 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 6400, loss[loss=0.05656, simple_loss=0.07476, pruned_loss=0.00986, audio_tagging_loss=0.009322, over 14626.00 frames. ], tot_loss[loss=0.06538, simple_loss=0.08848, pruned_loss=0.0121, audio_tagging_loss=0.009033, over 3039160.64 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 20:04:49,338 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.535e+01 9.001e+01 9.641e+01 1.036e+02 1.339e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-28 20:04:53,028 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 547500 2023-11-28 20:04:54,655 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.46 vs. limit=15.0 2023-11-28 20:04:57,692 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3649986.6666666665, ans=0.1 2023-11-28 20:05:06,875 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.89 vs. limit=15.0 2023-11-28 20:05:10,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3650053.3333333335, ans=0.0 2023-11-28 20:05:14,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3650053.3333333335, ans=0.95 2023-11-28 20:05:15,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3650053.3333333335, ans=0.125 2023-11-28 20:05:28,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3650120.0, ans=0.125 2023-11-28 20:05:30,170 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 6450, loss[loss=0.06686, simple_loss=0.08678, pruned_loss=0.01292, audio_tagging_loss=0.01055, over 15417.00 frames. ], tot_loss[loss=0.065, simple_loss=0.08785, pruned_loss=0.01191, audio_tagging_loss=0.009166, over 3037408.33 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 20:05:32,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3650186.6666666665, ans=0.125 2023-11-28 20:05:39,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3650186.6666666665, ans=0.0 2023-11-28 20:05:54,493 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 547550 2023-11-28 20:06:03,583 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3650320.0, ans=0.0 2023-11-28 20:06:10,293 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.35 vs. limit=15.0 2023-11-28 20:06:15,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3650386.6666666665, ans=0.125 2023-11-28 20:06:16,296 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3650386.6666666665, ans=0.0 2023-11-28 20:06:25,737 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.35 vs. limit=10.0 2023-11-28 20:06:25,878 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.33 vs. limit=6.0 2023-11-28 20:06:30,897 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 6500, loss[loss=0.06823, simple_loss=0.08822, pruned_loss=0.01459, audio_tagging_loss=0.009531, over 16286.00 frames. ], tot_loss[loss=0.06495, simple_loss=0.08803, pruned_loss=0.01189, audio_tagging_loss=0.009054, over 3035424.16 frames. ], batch size: 61, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:06:53,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3650586.6666666665, ans=0.2 2023-11-28 20:06:54,006 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.399e+01 8.749e+01 9.341e+01 9.951e+01 1.412e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-28 20:06:56,592 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 547600 2023-11-28 20:07:33,360 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 6550, loss[loss=0.04861, simple_loss=0.065, pruned_loss=0.007103, audio_tagging_loss=0.009003, over 15872.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.08846, pruned_loss=0.01198, audio_tagging_loss=0.008918, over 3037223.78 frames. ], batch size: 61, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:07:40,969 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.96 vs. limit=15.0 2023-11-28 20:07:58,220 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 547650 2023-11-28 20:08:14,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3651053.3333333335, ans=0.09899494936611666 2023-11-28 20:08:14,683 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3651053.3333333335, ans=0.125 2023-11-28 20:08:16,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3651053.3333333335, ans=0.125 2023-11-28 20:08:22,875 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3651120.0, ans=0.125 2023-11-28 20:08:33,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3651120.0, ans=0.0 2023-11-28 20:08:35,779 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 6600, loss[loss=0.05078, simple_loss=0.06428, pruned_loss=0.009185, audio_tagging_loss=0.009453, over 15498.00 frames. ], tot_loss[loss=0.06504, simple_loss=0.08882, pruned_loss=0.01192, audio_tagging_loss=0.00872, over 3032870.13 frames. ], batch size: 62, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:08:44,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3651186.6666666665, ans=0.125 2023-11-28 20:08:53,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3651253.3333333335, ans=0.025 2023-11-28 20:08:58,742 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.741e+01 8.944e+01 9.598e+01 1.039e+02 1.454e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-28 20:09:01,320 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 547700 2023-11-28 20:09:15,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3651386.6666666665, ans=0.1 2023-11-28 20:09:16,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3651386.6666666665, ans=0.125 2023-11-28 20:09:21,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3651386.6666666665, ans=0.1 2023-11-28 20:09:34,969 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3651453.3333333335, ans=0.2 2023-11-28 20:09:38,402 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 6650, loss[loss=0.05777, simple_loss=0.07671, pruned_loss=0.01188, audio_tagging_loss=0.007544, over 15530.00 frames. ], tot_loss[loss=0.06461, simple_loss=0.08822, pruned_loss=0.01186, audio_tagging_loss=0.008634, over 3039708.28 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:09:52,808 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.57 vs. limit=10.0 2023-11-28 20:10:03,183 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 547750 2023-11-28 20:10:05,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=3651653.3333333335, ans=15.0 2023-11-28 20:10:11,439 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.52 vs. limit=15.0 2023-11-28 20:10:19,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3651720.0, ans=0.125 2023-11-28 20:10:21,905 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3651720.0, ans=0.04949747468305833 2023-11-28 20:10:39,471 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 6700, loss[loss=0.0473, simple_loss=0.06042, pruned_loss=0.008946, audio_tagging_loss=0.008145, over 15083.00 frames. ], tot_loss[loss=0.06428, simple_loss=0.08798, pruned_loss=0.01174, audio_tagging_loss=0.008553, over 3037528.64 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:10:45,485 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3651853.3333333335, ans=0.1 2023-11-28 20:11:00,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3651920.0, ans=0.0 2023-11-28 20:11:02,335 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.131e+01 8.565e+01 9.256e+01 9.960e+01 1.372e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-28 20:11:04,752 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 547800 2023-11-28 20:11:17,263 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.03 vs. limit=10.0 2023-11-28 20:11:23,349 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3652053.3333333335, ans=0.2 2023-11-28 20:11:26,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3652053.3333333335, ans=0.125 2023-11-28 20:11:28,433 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.17 vs. limit=15.0 2023-11-28 20:11:35,847 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3652120.0, ans=0.125 2023-11-28 20:11:36,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3652120.0, ans=0.1 2023-11-28 20:11:38,712 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.95 vs. limit=15.0 2023-11-28 20:11:42,301 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 6750, loss[loss=0.06162, simple_loss=0.0841, pruned_loss=0.01113, audio_tagging_loss=0.008442, over 13573.00 frames. ], tot_loss[loss=0.06446, simple_loss=0.08815, pruned_loss=0.01175, audio_tagging_loss=0.008638, over 3042011.34 frames. ], batch size: 52, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:11:59,953 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.62 vs. limit=10.0 2023-11-28 20:12:05,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3652320.0, ans=0.0 2023-11-28 20:12:06,710 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 547850 2023-11-28 20:12:26,324 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.02 vs. limit=15.0 2023-11-28 20:12:29,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3652453.3333333335, ans=0.125 2023-11-28 20:12:38,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3652453.3333333335, ans=0.125 2023-11-28 20:12:43,531 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 6800, loss[loss=0.06374, simple_loss=0.08216, pruned_loss=0.01214, audio_tagging_loss=0.01053, over 15774.00 frames. ], tot_loss[loss=0.06418, simple_loss=0.08775, pruned_loss=0.01166, audio_tagging_loss=0.008643, over 3046623.79 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 20:12:44,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3652520.0, ans=0.125 2023-11-28 20:12:55,416 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.82 vs. limit=15.0 2023-11-28 20:13:05,389 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.899e+01 8.847e+01 9.241e+01 1.022e+02 1.257e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-28 20:13:07,810 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 547900 2023-11-28 20:13:38,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3652786.6666666665, ans=0.0 2023-11-28 20:13:45,148 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 6850, loss[loss=0.06642, simple_loss=0.09283, pruned_loss=0.01236, audio_tagging_loss=0.007648, over 16054.00 frames. ], tot_loss[loss=0.0642, simple_loss=0.08763, pruned_loss=0.0118, audio_tagging_loss=0.008588, over 3047459.73 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 20:13:53,014 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.51 vs. limit=15.0 2023-11-28 20:14:10,178 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 547950 2023-11-28 20:14:32,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3653053.3333333335, ans=0.07 2023-11-28 20:14:46,448 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 6900, loss[loss=0.05736, simple_loss=0.07429, pruned_loss=0.01014, audio_tagging_loss=0.01007, over 14320.00 frames. ], tot_loss[loss=0.06458, simple_loss=0.08829, pruned_loss=0.01184, audio_tagging_loss=0.008598, over 3043650.53 frames. ], batch size: 53, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 20:14:51,088 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3653186.6666666665, ans=0.1 2023-11-28 20:14:59,517 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3653253.3333333335, ans=0.2 2023-11-28 20:15:00,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3653253.3333333335, ans=0.125 2023-11-28 20:15:02,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3653253.3333333335, ans=0.0 2023-11-28 20:15:03,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3653253.3333333335, ans=0.0 2023-11-28 20:15:09,732 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.653e+01 9.041e+01 9.811e+01 1.026e+02 3.153e+02, threshold=1.962e+02, percent-clipped=1.0 2023-11-28 20:15:11,081 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 548000 2023-11-28 20:15:24,029 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3653320.0, ans=0.125 2023-11-28 20:15:27,644 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 20:15:32,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3653386.6666666665, ans=0.125 2023-11-28 20:15:37,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3653453.3333333335, ans=0.125 2023-11-28 20:15:39,563 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 20:15:42,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3653453.3333333335, ans=0.1 2023-11-28 20:15:50,612 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 6950, loss[loss=0.05007, simple_loss=0.06488, pruned_loss=0.008439, audio_tagging_loss=0.009189, over 14125.00 frames. ], tot_loss[loss=0.06438, simple_loss=0.08811, pruned_loss=0.01176, audio_tagging_loss=0.008574, over 3035416.18 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:16:11,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3653586.6666666665, ans=0.1 2023-11-28 20:16:11,895 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.75 vs. limit=6.0 2023-11-28 20:16:13,846 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3653653.3333333335, ans=10.0 2023-11-28 20:16:14,723 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 548050 2023-11-28 20:16:25,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3653720.0, ans=0.125 2023-11-28 20:16:29,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3653720.0, ans=0.125 2023-11-28 20:16:51,976 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 7000, loss[loss=0.05908, simple_loss=0.08469, pruned_loss=0.006355, audio_tagging_loss=0.01038, over 13919.00 frames. ], tot_loss[loss=0.06419, simple_loss=0.08799, pruned_loss=0.01163, audio_tagging_loss=0.008558, over 3035923.67 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:16:53,413 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3653853.3333333335, ans=0.2 2023-11-28 20:17:12,046 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.88 vs. limit=6.0 2023-11-28 20:17:12,593 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3653920.0, ans=0.0 2023-11-28 20:17:12,966 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.06 vs. limit=12.0 2023-11-28 20:17:15,188 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.579e+01 8.929e+01 9.464e+01 1.026e+02 1.498e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-28 20:17:16,489 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 548100 2023-11-28 20:17:23,826 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.92 vs. limit=15.0 2023-11-28 20:17:45,413 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.58 vs. limit=15.0 2023-11-28 20:17:53,521 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 7050, loss[loss=0.04349, simple_loss=0.05122, pruned_loss=0.008249, audio_tagging_loss=0.009637, over 16468.00 frames. ], tot_loss[loss=0.06441, simple_loss=0.08806, pruned_loss=0.01173, audio_tagging_loss=0.008651, over 3037113.03 frames. ], batch size: 64, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:18:08,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3654253.3333333335, ans=0.0 2023-11-28 20:18:12,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3654253.3333333335, ans=0.125 2023-11-28 20:18:17,093 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.37 vs. limit=6.0 2023-11-28 20:18:18,722 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 548150 2023-11-28 20:18:21,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3654320.0, ans=0.125 2023-11-28 20:18:30,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3654386.6666666665, ans=0.125 2023-11-28 20:18:45,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3654453.3333333335, ans=0.0 2023-11-28 20:18:56,452 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 7100, loss[loss=0.05966, simple_loss=0.07511, pruned_loss=0.01255, audio_tagging_loss=0.009557, over 15407.00 frames. ], tot_loss[loss=0.06484, simple_loss=0.08853, pruned_loss=0.01193, audio_tagging_loss=0.008644, over 3038043.39 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:19:01,555 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff3.min_abs, batch_count=3654520.0, ans=0.2 2023-11-28 20:19:09,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3654586.6666666665, ans=0.0 2023-11-28 20:19:11,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3654586.6666666665, ans=0.125 2023-11-28 20:19:16,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3654586.6666666665, ans=0.2 2023-11-28 20:19:18,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3654586.6666666665, ans=0.0 2023-11-28 20:19:19,320 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.256e+01 9.027e+01 9.544e+01 1.031e+02 1.344e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-28 20:19:20,606 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 548200 2023-11-28 20:19:22,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3654653.3333333335, ans=0.125 2023-11-28 20:19:30,378 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.26 vs. limit=6.0 2023-11-28 20:19:34,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3654720.0, ans=0.1 2023-11-28 20:19:39,017 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3654720.0, ans=0.125 2023-11-28 20:19:49,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3654786.6666666665, ans=0.125 2023-11-28 20:19:50,659 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.87 vs. limit=15.0 2023-11-28 20:19:58,795 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 7150, loss[loss=0.06233, simple_loss=0.08682, pruned_loss=0.009521, audio_tagging_loss=0.009401, over 16882.00 frames. ], tot_loss[loss=0.06515, simple_loss=0.08906, pruned_loss=0.01198, audio_tagging_loss=0.008639, over 3040876.30 frames. ], batch size: 63, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:20:00,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3654853.3333333335, ans=0.1 2023-11-28 20:20:03,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3654853.3333333335, ans=0.125 2023-11-28 20:20:09,764 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.99 vs. limit=15.0 2023-11-28 20:20:22,549 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 548250 2023-11-28 20:20:24,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3654986.6666666665, ans=0.125 2023-11-28 20:20:29,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=3654986.6666666665, ans=0.05 2023-11-28 20:20:59,410 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 7200, loss[loss=0.07515, simple_loss=0.1014, pruned_loss=0.01638, audio_tagging_loss=0.008095, over 15108.00 frames. ], tot_loss[loss=0.06493, simple_loss=0.08862, pruned_loss=0.01195, audio_tagging_loss=0.008675, over 3043057.42 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:21:11,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3655253.3333333335, ans=0.125 2023-11-28 20:21:15,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3655253.3333333335, ans=0.1 2023-11-28 20:21:17,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3655253.3333333335, ans=0.0 2023-11-28 20:21:19,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3655253.3333333335, ans=0.125 2023-11-28 20:21:20,645 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3655253.3333333335, ans=0.125 2023-11-28 20:21:24,331 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.990e+01 9.059e+01 9.674e+01 1.051e+02 1.523e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-28 20:21:24,504 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 548300 2023-11-28 20:21:26,937 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3655320.0, ans=0.0 2023-11-28 20:21:36,085 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.43 vs. limit=15.0 2023-11-28 20:21:38,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3655386.6666666665, ans=0.125 2023-11-28 20:21:41,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3655386.6666666665, ans=0.125 2023-11-28 20:22:01,292 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 7250, loss[loss=0.0631, simple_loss=0.08271, pruned_loss=0.01129, audio_tagging_loss=0.01045, over 15734.00 frames. ], tot_loss[loss=0.0652, simple_loss=0.08895, pruned_loss=0.01198, audio_tagging_loss=0.008753, over 3042207.73 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:22:26,074 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 548350 2023-11-28 20:23:01,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3655786.6666666665, ans=0.5 2023-11-28 20:23:03,243 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 7300, loss[loss=0.06385, simple_loss=0.08742, pruned_loss=0.01228, audio_tagging_loss=0.00786, over 14450.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08951, pruned_loss=0.0121, audio_tagging_loss=0.008686, over 3048969.48 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:23:06,585 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 20:23:13,633 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.12 vs. limit=6.0 2023-11-28 20:23:22,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3655920.0, ans=0.0 2023-11-28 20:23:27,660 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.253e+01 8.815e+01 9.477e+01 1.035e+02 1.367e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-28 20:23:27,807 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 548400 2023-11-28 20:23:36,086 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3655986.6666666665, ans=0.125 2023-11-28 20:23:37,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3655986.6666666665, ans=0.125 2023-11-28 20:24:04,866 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 7350, loss[loss=0.07241, simple_loss=0.1073, pruned_loss=0.01287, audio_tagging_loss=0.005885, over 15222.00 frames. ], tot_loss[loss=0.06528, simple_loss=0.08926, pruned_loss=0.01208, audio_tagging_loss=0.008565, over 3048521.55 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:24:05,237 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3656186.6666666665, ans=0.0 2023-11-28 20:24:12,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3656186.6666666665, ans=0.125 2023-11-28 20:24:13,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3656186.6666666665, ans=0.0 2023-11-28 20:24:17,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3656253.3333333335, ans=0.125 2023-11-28 20:24:20,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3656253.3333333335, ans=0.2 2023-11-28 20:24:29,520 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 548450 2023-11-28 20:24:59,573 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.85 vs. limit=10.0 2023-11-28 20:25:00,600 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.17 vs. limit=12.0 2023-11-28 20:25:06,614 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 7400, loss[loss=0.05473, simple_loss=0.07143, pruned_loss=0.009661, audio_tagging_loss=0.009361, over 15419.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08974, pruned_loss=0.01217, audio_tagging_loss=0.008444, over 3042200.50 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:25:09,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3656520.0, ans=0.125 2023-11-28 20:25:30,865 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.703e+01 8.862e+01 9.547e+01 1.020e+02 1.427e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-28 20:25:31,059 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 548500 2023-11-28 20:25:47,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3656720.0, ans=0.125 2023-11-28 20:25:57,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3656786.6666666665, ans=0.125 2023-11-28 20:25:59,271 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3656786.6666666665, ans=0.0 2023-11-28 20:26:05,250 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3656786.6666666665, ans=0.2 2023-11-28 20:26:07,284 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 7450, loss[loss=0.09515, simple_loss=0.1248, pruned_loss=0.02506, audio_tagging_loss=0.007666, over 14992.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.08968, pruned_loss=0.01232, audio_tagging_loss=0.008395, over 3042509.26 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:26:12,525 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3656853.3333333335, ans=0.1 2023-11-28 20:26:29,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3656920.0, ans=0.0 2023-11-28 20:26:32,758 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 548550 2023-11-28 20:26:35,499 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3656986.6666666665, ans=0.125 2023-11-28 20:26:47,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3657053.3333333335, ans=0.125 2023-11-28 20:26:48,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3657053.3333333335, ans=0.07 2023-11-28 20:26:50,376 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3657053.3333333335, ans=0.2 2023-11-28 20:26:51,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3657053.3333333335, ans=0.0 2023-11-28 20:26:56,994 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.20 vs. limit=15.0 2023-11-28 20:27:00,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3657120.0, ans=0.0 2023-11-28 20:27:09,849 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 7500, loss[loss=0.06927, simple_loss=0.09793, pruned_loss=0.01481, audio_tagging_loss=0.005496, over 13854.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.09003, pruned_loss=0.01229, audio_tagging_loss=0.00836, over 3042835.61 frames. ], batch size: 53, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:27:11,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3657186.6666666665, ans=0.0 2023-11-28 20:27:14,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3657186.6666666665, ans=0.125 2023-11-28 20:27:34,140 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.394e+01 8.864e+01 9.552e+01 1.022e+02 1.615e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-28 20:27:34,316 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 548600 2023-11-28 20:27:37,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=3657320.0, ans=15.0 2023-11-28 20:28:12,524 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 7550, loss[loss=0.06531, simple_loss=0.08743, pruned_loss=0.01472, audio_tagging_loss=0.006878, over 14975.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.0901, pruned_loss=0.01234, audio_tagging_loss=0.008461, over 3045634.08 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:28:18,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3657520.0, ans=0.125 2023-11-28 20:28:26,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3657586.6666666665, ans=0.125 2023-11-28 20:28:26,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3657586.6666666665, ans=0.125 2023-11-28 20:28:37,441 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 548650 2023-11-28 20:28:40,564 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.68 vs. limit=22.5 2023-11-28 20:29:04,429 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3657786.6666666665, ans=0.0 2023-11-28 20:29:13,278 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 7600, loss[loss=0.03934, simple_loss=0.04946, pruned_loss=0.003628, audio_tagging_loss=0.01098, over 15447.00 frames. ], tot_loss[loss=0.06505, simple_loss=0.08893, pruned_loss=0.01213, audio_tagging_loss=0.008458, over 3041509.49 frames. ], batch size: 62, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 20:29:15,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3657853.3333333335, ans=0.125 2023-11-28 20:29:38,932 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.869e+01 9.036e+01 9.725e+01 1.055e+02 1.335e+02, threshold=1.945e+02, percent-clipped=0.0 2023-11-28 20:29:39,134 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 548700 2023-11-28 20:29:41,596 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3657986.6666666665, ans=0.1 2023-11-28 20:29:48,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=3657986.6666666665, ans=15.0 2023-11-28 20:29:57,956 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.60 vs. limit=15.0 2023-11-28 20:29:59,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3658053.3333333335, ans=0.1 2023-11-28 20:30:15,870 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 7650, loss[loss=0.05953, simple_loss=0.07872, pruned_loss=0.01048, audio_tagging_loss=0.009686, over 15450.00 frames. ], tot_loss[loss=0.06481, simple_loss=0.08844, pruned_loss=0.0121, audio_tagging_loss=0.008491, over 3034720.91 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 20:30:38,805 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.77 vs. limit=22.5 2023-11-28 20:30:40,790 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 548750 2023-11-28 20:30:46,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3658320.0, ans=0.125 2023-11-28 20:30:50,528 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.84 vs. limit=10.0 2023-11-28 20:31:03,234 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 20:31:08,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3658453.3333333335, ans=0.125 2023-11-28 20:31:09,760 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.93 vs. limit=12.0 2023-11-28 20:31:17,767 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 7700, loss[loss=0.04736, simple_loss=0.05782, pruned_loss=0.005753, audio_tagging_loss=0.0127, over 16248.00 frames. ], tot_loss[loss=0.06439, simple_loss=0.08787, pruned_loss=0.01199, audio_tagging_loss=0.008473, over 3038735.91 frames. ], batch size: 62, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 20:31:35,057 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3658586.6666666665, ans=0.2 2023-11-28 20:31:42,444 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 548800 2023-11-28 20:31:42,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=3658653.3333333335, ans=0.025 2023-11-28 20:31:43,470 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.090e+01 9.039e+01 9.612e+01 1.044e+02 1.351e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-28 20:32:10,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3658786.6666666665, ans=0.0 2023-11-28 20:32:17,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3658786.6666666665, ans=0.2 2023-11-28 20:32:19,651 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 7750, loss[loss=0.07503, simple_loss=0.09758, pruned_loss=0.01685, audio_tagging_loss=0.009385, over 15049.00 frames. ], tot_loss[loss=0.06458, simple_loss=0.08789, pruned_loss=0.01206, audio_tagging_loss=0.00857, over 3033938.10 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 20:32:31,743 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3658920.0, ans=0.125 2023-11-28 20:32:32,268 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.04 vs. limit=6.0 2023-11-28 20:32:44,857 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 548850 2023-11-28 20:32:45,387 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.77 vs. limit=10.0 2023-11-28 20:32:46,365 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 20:32:58,282 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.05 vs. limit=10.0 2023-11-28 20:33:00,194 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3659053.3333333335, ans=0.0 2023-11-28 20:33:00,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3659053.3333333335, ans=0.125 2023-11-28 20:33:02,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3659053.3333333335, ans=0.125 2023-11-28 20:33:03,226 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.54 vs. limit=6.0 2023-11-28 20:33:19,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3659120.0, ans=0.0 2023-11-28 20:33:22,011 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 7800, loss[loss=0.05409, simple_loss=0.07048, pruned_loss=0.009483, audio_tagging_loss=0.009368, over 16275.00 frames. ], tot_loss[loss=0.06474, simple_loss=0.0883, pruned_loss=0.01206, audio_tagging_loss=0.008528, over 3037760.79 frames. ], batch size: 63, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 20:33:33,186 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3659186.6666666665, ans=0.125 2023-11-28 20:33:38,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3659253.3333333335, ans=0.125 2023-11-28 20:33:39,212 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.46 vs. limit=22.5 2023-11-28 20:33:47,413 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 548900 2023-11-28 20:33:48,451 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.911e+01 8.914e+01 9.757e+01 1.064e+02 1.306e+02, threshold=1.951e+02, percent-clipped=0.0 2023-11-28 20:33:58,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3659386.6666666665, ans=0.0 2023-11-28 20:34:05,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3659386.6666666665, ans=0.2 2023-11-28 20:34:13,035 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3659453.3333333335, ans=0.125 2023-11-28 20:34:24,312 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 7850, loss[loss=0.05208, simple_loss=0.06317, pruned_loss=0.009618, audio_tagging_loss=0.01087, over 13446.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08911, pruned_loss=0.01221, audio_tagging_loss=0.008537, over 3036768.13 frames. ], batch size: 52, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 20:34:34,518 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3659520.0, ans=0.0 2023-11-28 20:34:41,433 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3659586.6666666665, ans=0.125 2023-11-28 20:34:45,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3659586.6666666665, ans=0.125 2023-11-28 20:34:49,090 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 548950 2023-11-28 20:35:23,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3659786.6666666665, ans=0.0 2023-11-28 20:35:23,489 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.17 vs. limit=22.5 2023-11-28 20:35:25,336 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 7900, loss[loss=0.07511, simple_loss=0.1043, pruned_loss=0.01428, audio_tagging_loss=0.008686, over 15393.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.09046, pruned_loss=0.01244, audio_tagging_loss=0.008534, over 3046887.52 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 20:35:26,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3659853.3333333335, ans=0.125 2023-11-28 20:35:49,654 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 549000 2023-11-28 20:35:50,708 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.838e+01 9.083e+01 9.481e+01 1.023e+02 1.467e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-28 20:36:26,834 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 7950, loss[loss=0.06826, simple_loss=0.08614, pruned_loss=0.01662, audio_tagging_loss=0.008565, over 14877.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.08993, pruned_loss=0.01243, audio_tagging_loss=0.008638, over 3046667.26 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 20:36:39,658 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3660253.3333333335, ans=0.1 2023-11-28 20:36:44,727 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 20:36:48,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3660253.3333333335, ans=0.0 2023-11-28 20:36:51,498 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3660320.0, ans=0.2 2023-11-28 20:36:52,486 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 549050 2023-11-28 20:37:05,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3660386.6666666665, ans=0.125 2023-11-28 20:37:12,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3660386.6666666665, ans=0.0 2023-11-28 20:37:28,996 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 8000, loss[loss=0.06709, simple_loss=0.09773, pruned_loss=0.009117, audio_tagging_loss=0.009105, over 16118.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08904, pruned_loss=0.01223, audio_tagging_loss=0.008776, over 3053817.97 frames. ], batch size: 59, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 20:37:43,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3660586.6666666665, ans=0.125 2023-11-28 20:37:53,806 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 549100 2023-11-28 20:37:54,810 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.078e+01 8.902e+01 9.396e+01 1.018e+02 1.315e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-28 20:38:02,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3660653.3333333335, ans=0.125 2023-11-28 20:38:13,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3660720.0, ans=0.1 2023-11-28 20:38:26,727 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3660786.6666666665, ans=0.025 2023-11-28 20:38:31,197 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 8050, loss[loss=0.07539, simple_loss=0.09831, pruned_loss=0.01439, audio_tagging_loss=0.01185, over 16559.00 frames. ], tot_loss[loss=0.06464, simple_loss=0.0877, pruned_loss=0.01189, audio_tagging_loss=0.008898, over 3054048.45 frames. ], batch size: 62, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 20:38:46,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3660920.0, ans=0.0 2023-11-28 20:38:46,078 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3660920.0, ans=0.125 2023-11-28 20:38:55,151 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 549150 2023-11-28 20:38:57,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3660986.6666666665, ans=0.125 2023-11-28 20:39:00,618 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3660986.6666666665, ans=0.1 2023-11-28 20:39:32,429 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 8100, loss[loss=0.06907, simple_loss=0.1042, pruned_loss=0.01085, audio_tagging_loss=0.006111, over 15552.00 frames. ], tot_loss[loss=0.0649, simple_loss=0.08824, pruned_loss=0.01194, audio_tagging_loss=0.008841, over 3051022.81 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 20:39:37,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3661186.6666666665, ans=0.125 2023-11-28 20:39:47,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3661253.3333333335, ans=0.125 2023-11-28 20:39:56,948 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 549200 2023-11-28 20:40:00,136 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.328e+01 9.129e+01 9.751e+01 1.053e+02 1.304e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-28 20:40:06,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3661320.0, ans=0.0 2023-11-28 20:40:14,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3661386.6666666665, ans=0.2 2023-11-28 20:40:23,271 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3661453.3333333335, ans=0.125 2023-11-28 20:40:23,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3661453.3333333335, ans=0.2 2023-11-28 20:40:34,216 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 8150, loss[loss=0.05959, simple_loss=0.07519, pruned_loss=0.01031, audio_tagging_loss=0.01168, over 16042.00 frames. ], tot_loss[loss=0.06506, simple_loss=0.08862, pruned_loss=0.012, audio_tagging_loss=0.008755, over 3054533.97 frames. ], batch size: 64, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 20:40:52,631 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.08 vs. limit=15.0 2023-11-28 20:40:54,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3661586.6666666665, ans=0.2 2023-11-28 20:40:58,833 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 549250 2023-11-28 20:41:14,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3661720.0, ans=0.0 2023-11-28 20:41:29,716 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.71 vs. limit=10.0 2023-11-28 20:41:35,440 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 8200, loss[loss=0.07509, simple_loss=0.111, pruned_loss=0.01382, audio_tagging_loss=0.005797, over 15633.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08955, pruned_loss=0.01214, audio_tagging_loss=0.008581, over 3055479.04 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 20:41:36,771 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 20:41:59,827 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 549300 2023-11-28 20:42:02,044 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.159e+01 8.719e+01 9.487e+01 1.033e+02 1.453e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-28 20:42:05,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3661986.6666666665, ans=0.125 2023-11-28 20:42:27,018 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3662120.0, ans=0.0 2023-11-28 20:42:36,673 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 8250, loss[loss=0.06078, simple_loss=0.0829, pruned_loss=0.009039, audio_tagging_loss=0.01029, over 15995.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.08873, pruned_loss=0.01193, audio_tagging_loss=0.008613, over 3062203.11 frames. ], batch size: 61, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 20:42:45,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3662186.6666666665, ans=0.04949747468305833 2023-11-28 20:42:58,498 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3662253.3333333335, ans=0.05 2023-11-28 20:43:00,564 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 549350 2023-11-28 20:43:09,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3662320.0, ans=0.125 2023-11-28 20:43:37,432 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 8300, loss[loss=0.04963, simple_loss=0.06366, pruned_loss=0.01017, audio_tagging_loss=0.007625, over 14434.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.0896, pruned_loss=0.01214, audio_tagging_loss=0.008578, over 3050288.55 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 20:43:38,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3662520.0, ans=0.0 2023-11-28 20:44:02,496 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 549400 2023-11-28 20:44:05,024 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.090e+01 9.200e+01 9.674e+01 1.037e+02 1.231e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-28 20:44:16,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3662720.0, ans=0.0 2023-11-28 20:44:19,888 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3662720.0, ans=0.0 2023-11-28 20:44:27,522 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3662786.6666666665, ans=0.125 2023-11-28 20:44:39,621 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 8350, loss[loss=0.06537, simple_loss=0.09609, pruned_loss=0.01077, audio_tagging_loss=0.006557, over 14713.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.0896, pruned_loss=0.01213, audio_tagging_loss=0.008561, over 3051419.58 frames. ], batch size: 52, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 20:44:45,145 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3662853.3333333335, ans=0.0 2023-11-28 20:45:04,376 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 549450 2023-11-28 20:45:28,358 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.28 vs. limit=12.0 2023-11-28 20:45:33,628 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.16 vs. limit=6.0 2023-11-28 20:45:40,466 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.85 vs. limit=15.0 2023-11-28 20:45:40,900 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 8400, loss[loss=0.06334, simple_loss=0.08521, pruned_loss=0.01268, audio_tagging_loss=0.008056, over 15364.00 frames. ], tot_loss[loss=0.06465, simple_loss=0.0882, pruned_loss=0.01193, audio_tagging_loss=0.008616, over 3045549.03 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 20:46:05,533 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 549500 2023-11-28 20:46:07,780 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.367e+01 8.725e+01 9.379e+01 9.933e+01 1.253e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-28 20:46:13,387 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3663320.0, ans=0.125 2023-11-28 20:46:32,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3663453.3333333335, ans=0.125 2023-11-28 20:46:37,585 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.00 vs. limit=15.0 2023-11-28 20:46:42,565 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 8450, loss[loss=0.08356, simple_loss=0.1219, pruned_loss=0.01679, audio_tagging_loss=0.005827, over 15851.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.08882, pruned_loss=0.01206, audio_tagging_loss=0.008614, over 3046510.93 frames. ], batch size: 59, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 20:46:42,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3663520.0, ans=0.125 2023-11-28 20:46:56,246 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3663586.6666666665, ans=0.0 2023-11-28 20:47:07,183 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 549550 2023-11-28 20:47:09,799 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.77 vs. limit=15.0 2023-11-28 20:47:36,822 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.91 vs. limit=15.0 2023-11-28 20:47:44,300 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 8500, loss[loss=0.05884, simple_loss=0.08881, pruned_loss=0.007336, audio_tagging_loss=0.007095, over 15356.00 frames. ], tot_loss[loss=0.06486, simple_loss=0.08872, pruned_loss=0.0119, audio_tagging_loss=0.008597, over 3048779.17 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 20:47:45,787 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3663853.3333333335, ans=0.0 2023-11-28 20:47:48,046 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.34 vs. limit=6.0 2023-11-28 20:47:52,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3663853.3333333335, ans=0.1 2023-11-28 20:47:52,186 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3663853.3333333335, ans=0.1 2023-11-28 20:48:09,488 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 549600 2023-11-28 20:48:11,910 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.113e+01 8.752e+01 9.470e+01 1.030e+02 1.283e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-28 20:48:27,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3664053.3333333335, ans=0.0 2023-11-28 20:48:46,235 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 8550, loss[loss=0.06353, simple_loss=0.0871, pruned_loss=0.01208, audio_tagging_loss=0.007902, over 15149.00 frames. ], tot_loss[loss=0.06477, simple_loss=0.0882, pruned_loss=0.01199, audio_tagging_loss=0.008682, over 3048666.17 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 20:48:55,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3664186.6666666665, ans=0.025 2023-11-28 20:49:00,171 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3664253.3333333335, ans=0.125 2023-11-28 20:49:05,036 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.65 vs. limit=15.0 2023-11-28 20:49:10,903 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 549650 2023-11-28 20:49:22,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3664386.6666666665, ans=0.125 2023-11-28 20:49:24,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3664386.6666666665, ans=0.0 2023-11-28 20:49:34,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3664453.3333333335, ans=0.0 2023-11-28 20:49:34,857 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.74 vs. limit=15.0 2023-11-28 20:49:47,962 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 8600, loss[loss=0.06218, simple_loss=0.08272, pruned_loss=0.01152, audio_tagging_loss=0.009299, over 14463.00 frames. ], tot_loss[loss=0.06507, simple_loss=0.08863, pruned_loss=0.01207, audio_tagging_loss=0.008685, over 3050016.46 frames. ], batch size: 54, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 20:49:53,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3664520.0, ans=0.1 2023-11-28 20:49:59,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=3664586.6666666665, ans=0.05 2023-11-28 20:50:00,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3664586.6666666665, ans=0.125 2023-11-28 20:50:05,855 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.55 vs. limit=22.5 2023-11-28 20:50:12,684 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 549700 2023-11-28 20:50:16,005 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.762e+01 8.961e+01 9.460e+01 1.024e+02 1.421e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-28 20:50:35,429 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3664720.0, ans=0.1 2023-11-28 20:50:35,874 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.89 vs. limit=10.0 2023-11-28 20:50:43,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3664786.6666666665, ans=0.2 2023-11-28 20:50:49,975 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 8650, loss[loss=0.09071, simple_loss=0.1248, pruned_loss=0.02039, audio_tagging_loss=0.007938, over 16537.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08942, pruned_loss=0.01214, audio_tagging_loss=0.008686, over 3049352.22 frames. ], batch size: 60, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 20:50:50,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3664853.3333333335, ans=0.95 2023-11-28 20:51:01,830 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.86 vs. limit=22.5 2023-11-28 20:51:07,326 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3664920.0, ans=0.2 2023-11-28 20:51:14,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3664986.6666666665, ans=0.1 2023-11-28 20:51:14,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3664986.6666666665, ans=0.125 2023-11-28 20:51:15,644 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 549750 2023-11-28 20:51:21,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3664986.6666666665, ans=0.125 2023-11-28 20:51:27,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3665053.3333333335, ans=0.125 2023-11-28 20:51:28,427 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3665053.3333333335, ans=0.125 2023-11-28 20:51:39,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3665120.0, ans=0.0 2023-11-28 20:51:42,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3665120.0, ans=0.09899494936611666 2023-11-28 20:51:51,527 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 8700, loss[loss=0.05912, simple_loss=0.0761, pruned_loss=0.01304, audio_tagging_loss=0.00803, over 14839.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08925, pruned_loss=0.01196, audio_tagging_loss=0.00871, over 3045467.76 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 20:52:10,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3665253.3333333335, ans=0.09899494936611666 2023-11-28 20:52:16,872 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 549800 2023-11-28 20:52:20,543 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.462e+01 8.859e+01 9.712e+01 1.046e+02 1.344e+02, threshold=1.942e+02, percent-clipped=0.0 2023-11-28 20:52:31,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3665386.6666666665, ans=0.125 2023-11-28 20:52:34,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3665386.6666666665, ans=0.07 2023-11-28 20:52:53,900 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 8750, loss[loss=0.0549, simple_loss=0.07472, pruned_loss=0.00763, audio_tagging_loss=0.009911, over 14854.00 frames. ], tot_loss[loss=0.06538, simple_loss=0.0892, pruned_loss=0.01202, audio_tagging_loss=0.008766, over 3044765.82 frames. ], batch size: 54, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 20:52:54,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3665520.0, ans=0.125 2023-11-28 20:52:58,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3665520.0, ans=0.125 2023-11-28 20:53:18,505 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 549850 2023-11-28 20:53:55,661 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 8800, loss[loss=0.05039, simple_loss=0.072, pruned_loss=0.005204, audio_tagging_loss=0.00919, over 15693.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.09056, pruned_loss=0.01219, audio_tagging_loss=0.008813, over 3056720.89 frames. ], batch size: 60, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 20:54:19,678 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 549900 2023-11-28 20:54:23,622 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.083e+01 9.097e+01 9.649e+01 1.028e+02 1.198e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-28 20:54:35,352 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.91 vs. limit=6.0 2023-11-28 20:54:43,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3666120.0, ans=0.125 2023-11-28 20:54:45,192 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.99 vs. limit=15.0 2023-11-28 20:54:56,778 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 8850, loss[loss=0.05423, simple_loss=0.0715, pruned_loss=0.01052, audio_tagging_loss=0.007957, over 14501.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.09034, pruned_loss=0.01222, audio_tagging_loss=0.008809, over 3053423.84 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 20:54:57,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3666186.6666666665, ans=0.125 2023-11-28 20:55:00,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3666186.6666666665, ans=0.125 2023-11-28 20:55:01,742 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3666186.6666666665, ans=0.2 2023-11-28 20:55:06,948 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3666186.6666666665, ans=0.04949747468305833 2023-11-28 20:55:09,119 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 20:55:21,907 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 549950 2023-11-28 20:55:28,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3666320.0, ans=0.125 2023-11-28 20:55:58,472 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 8900, loss[loss=0.08321, simple_loss=0.1118, pruned_loss=0.01853, audio_tagging_loss=0.008797, over 15649.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.09084, pruned_loss=0.01235, audio_tagging_loss=0.008612, over 3056300.54 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 8.0 2023-11-28 20:56:17,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3666586.6666666665, ans=0.0 2023-11-28 20:56:23,588 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 550000 2023-11-28 20:56:23,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3666653.3333333335, ans=0.125 2023-11-28 20:56:29,517 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.642e+01 8.959e+01 9.608e+01 1.039e+02 1.784e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-28 20:56:30,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3666653.3333333335, ans=10.0 2023-11-28 20:56:33,572 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.15 vs. limit=15.0 2023-11-28 20:56:37,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3666720.0, ans=0.125 2023-11-28 20:56:41,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3666720.0, ans=0.125 2023-11-28 20:57:00,013 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 8950, loss[loss=0.08334, simple_loss=0.1193, pruned_loss=0.01839, audio_tagging_loss=0.00531, over 14701.00 frames. ], tot_loss[loss=0.06624, simple_loss=0.09081, pruned_loss=0.01231, audio_tagging_loss=0.008524, over 3055976.20 frames. ], batch size: 52, lr: 1.46e-03, grad_scale: 8.0 2023-11-28 20:57:18,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3666920.0, ans=0.1 2023-11-28 20:57:19,241 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.82 vs. limit=15.0 2023-11-28 20:57:24,746 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 550050 2023-11-28 20:57:37,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3667053.3333333335, ans=0.0 2023-11-28 20:58:01,274 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.95 vs. limit=15.0 2023-11-28 20:58:02,920 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 9000, loss[loss=0.06174, simple_loss=0.07947, pruned_loss=0.01325, audio_tagging_loss=0.008761, over 16191.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.09068, pruned_loss=0.01229, audio_tagging_loss=0.008405, over 3056539.06 frames. ], batch size: 63, lr: 1.46e-03, grad_scale: 8.0 2023-11-28 20:58:02,921 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-28 20:58:42,563 INFO [train_asr.py:1267] (1/4) Epoch 46, validation: loss=0.05897, simple_loss=0.05047, pruned_loss=0.005253, audio_tagging_loss=0.02848, over 4681554.00 frames. 2023-11-28 20:58:42,564 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-28 20:58:48,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3667186.6666666665, ans=0.2 2023-11-28 20:58:56,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3667253.3333333335, ans=0.125 2023-11-28 20:59:07,646 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 550100 2023-11-28 20:59:11,373 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3667320.0, ans=0.0 2023-11-28 20:59:13,416 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.455e+01 8.853e+01 9.549e+01 1.044e+02 1.258e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-28 20:59:17,734 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.82 vs. limit=22.5 2023-11-28 20:59:19,500 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3667386.6666666665, ans=0.2 2023-11-28 20:59:23,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3667386.6666666665, ans=0.0 2023-11-28 20:59:44,796 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 9050, loss[loss=0.06375, simple_loss=0.08747, pruned_loss=0.01276, audio_tagging_loss=0.007263, over 15087.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.09038, pruned_loss=0.01224, audio_tagging_loss=0.008342, over 3055560.64 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 8.0 2023-11-28 20:59:48,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3667520.0, ans=0.0 2023-11-28 21:00:05,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3667586.6666666665, ans=0.0 2023-11-28 21:00:09,587 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 550150 2023-11-28 21:00:20,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3667720.0, ans=0.2 2023-11-28 21:00:46,677 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 9100, loss[loss=0.07963, simple_loss=0.1083, pruned_loss=0.01549, audio_tagging_loss=0.009986, over 15245.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.09032, pruned_loss=0.01224, audio_tagging_loss=0.008408, over 3059645.72 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 8.0 2023-11-28 21:00:47,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3667853.3333333335, ans=0.125 2023-11-28 21:01:02,324 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.71 vs. limit=15.0 2023-11-28 21:01:04,942 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=14.16 vs. limit=15.0 2023-11-28 21:01:12,335 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 550200 2023-11-28 21:01:18,391 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.247e+01 8.959e+01 9.673e+01 1.042e+02 1.442e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-28 21:01:22,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3667986.6666666665, ans=0.1 2023-11-28 21:01:48,502 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 9150, loss[loss=0.07005, simple_loss=0.09437, pruned_loss=0.01407, audio_tagging_loss=0.008788, over 14530.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08974, pruned_loss=0.01211, audio_tagging_loss=0.008385, over 3061405.39 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 8.0 2023-11-28 21:01:49,864 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3668186.6666666665, ans=0.125 2023-11-28 21:02:02,239 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 21:02:13,331 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 550250 2023-11-28 21:02:14,604 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3668320.0, ans=0.125 2023-11-28 21:02:27,728 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.36 vs. limit=15.0 2023-11-28 21:02:44,386 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3668453.3333333335, ans=0.0 2023-11-28 21:02:47,692 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.71 vs. limit=15.0 2023-11-28 21:02:50,552 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 9200, loss[loss=0.06668, simple_loss=0.09158, pruned_loss=0.01253, audio_tagging_loss=0.008355, over 15060.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.0897, pruned_loss=0.01221, audio_tagging_loss=0.00843, over 3058770.22 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:02:57,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3668520.0, ans=0.0 2023-11-28 21:03:14,986 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 550300 2023-11-28 21:03:21,351 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.562e+01 8.689e+01 9.468e+01 1.009e+02 1.302e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-28 21:03:48,194 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 21:03:52,650 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 9250, loss[loss=0.05469, simple_loss=0.06752, pruned_loss=0.008726, audio_tagging_loss=0.01221, over 14115.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.08924, pruned_loss=0.01218, audio_tagging_loss=0.008392, over 3054990.03 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:04:05,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3668920.0, ans=0.07 2023-11-28 21:04:08,317 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.25 vs. limit=15.0 2023-11-28 21:04:15,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3668920.0, ans=0.1 2023-11-28 21:04:17,027 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 550350 2023-11-28 21:04:21,901 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.72 vs. limit=22.5 2023-11-28 21:04:31,057 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3669053.3333333335, ans=0.125 2023-11-28 21:04:54,356 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 9300, loss[loss=0.07708, simple_loss=0.1064, pruned_loss=0.01466, audio_tagging_loss=0.009212, over 16448.00 frames. ], tot_loss[loss=0.06498, simple_loss=0.08908, pruned_loss=0.012, audio_tagging_loss=0.008439, over 3056363.29 frames. ], batch size: 60, lr: 1.46e-03, grad_scale: 8.0 2023-11-28 21:05:01,720 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3669186.6666666665, ans=0.09899494936611666 2023-11-28 21:05:04,084 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3669186.6666666665, ans=0.0 2023-11-28 21:05:18,882 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 550400 2023-11-28 21:05:26,622 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.639e+01 8.959e+01 9.529e+01 1.028e+02 1.391e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-28 21:05:38,204 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3669386.6666666665, ans=0.125 2023-11-28 21:05:41,936 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.32 vs. limit=15.0 2023-11-28 21:05:52,594 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3669453.3333333335, ans=0.125 2023-11-28 21:05:54,725 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.65 vs. limit=12.0 2023-11-28 21:05:54,971 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.47 vs. limit=15.0 2023-11-28 21:05:56,365 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 9350, loss[loss=0.05815, simple_loss=0.07067, pruned_loss=0.009616, audio_tagging_loss=0.0132, over 13619.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08954, pruned_loss=0.01224, audio_tagging_loss=0.008443, over 3049657.79 frames. ], batch size: 53, lr: 1.46e-03, grad_scale: 8.0 2023-11-28 21:06:11,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3669586.6666666665, ans=0.0 2023-11-28 21:06:20,868 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 550450 2023-11-28 21:06:45,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3669786.6666666665, ans=0.125 2023-11-28 21:06:58,196 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 9400, loss[loss=0.06277, simple_loss=0.08792, pruned_loss=0.0103, audio_tagging_loss=0.008514, over 15345.00 frames. ], tot_loss[loss=0.0648, simple_loss=0.0886, pruned_loss=0.01194, audio_tagging_loss=0.008566, over 3053975.31 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 8.0 2023-11-28 21:07:04,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3669853.3333333335, ans=0.2 2023-11-28 21:07:06,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3669853.3333333335, ans=0.1 2023-11-28 21:07:08,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3669853.3333333335, ans=0.125 2023-11-28 21:07:22,474 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 550500 2023-11-28 21:07:30,100 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.535e+01 9.168e+01 9.669e+01 1.024e+02 1.175e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-28 21:07:31,541 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3669986.6666666665, ans=0.0 2023-11-28 21:07:34,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3670053.3333333335, ans=0.125 2023-11-28 21:07:54,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3670120.0, ans=0.125 2023-11-28 21:07:56,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3670120.0, ans=0.125 2023-11-28 21:07:58,114 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 21:07:59,926 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 9450, loss[loss=0.06198, simple_loss=0.09237, pruned_loss=0.008924, audio_tagging_loss=0.006873, over 15630.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.0892, pruned_loss=0.01201, audio_tagging_loss=0.008637, over 3054472.25 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 8.0 2023-11-28 21:08:19,609 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3670253.3333333335, ans=0.0 2023-11-28 21:08:21,907 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 21:08:24,104 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 550550 2023-11-28 21:08:31,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3670320.0, ans=0.0 2023-11-28 21:08:39,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3670386.6666666665, ans=0.125 2023-11-28 21:09:01,342 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 9500, loss[loss=0.06719, simple_loss=0.08682, pruned_loss=0.01152, audio_tagging_loss=0.01226, over 14369.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.09002, pruned_loss=0.01211, audio_tagging_loss=0.008694, over 3057677.14 frames. ], batch size: 53, lr: 1.46e-03, grad_scale: 8.0 2023-11-28 21:09:02,071 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.67 vs. limit=12.0 2023-11-28 21:09:15,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3670586.6666666665, ans=0.125 2023-11-28 21:09:25,505 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 550600 2023-11-28 21:09:32,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3670653.3333333335, ans=0.125 2023-11-28 21:09:32,726 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3670653.3333333335, ans=0.125 2023-11-28 21:09:33,450 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.401e+01 9.097e+01 9.748e+01 1.049e+02 1.377e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-28 21:09:38,324 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3670720.0, ans=0.125 2023-11-28 21:09:38,723 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.23 vs. limit=12.0 2023-11-28 21:10:00,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3670786.6666666665, ans=0.2 2023-11-28 21:10:03,517 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 9550, loss[loss=0.06322, simple_loss=0.08219, pruned_loss=0.0108, audio_tagging_loss=0.01133, over 14895.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.09007, pruned_loss=0.01198, audio_tagging_loss=0.0088, over 3052298.93 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 8.0 2023-11-28 21:10:11,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3670853.3333333335, ans=0.125 2023-11-28 21:10:17,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3670920.0, ans=0.0 2023-11-28 21:10:25,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3670920.0, ans=0.125 2023-11-28 21:10:27,614 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 550650 2023-11-28 21:10:29,248 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.75 vs. limit=15.0 2023-11-28 21:10:36,906 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.99 vs. limit=10.0 2023-11-28 21:10:40,164 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 21:10:57,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3671120.0, ans=0.2 2023-11-28 21:11:04,575 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 9600, loss[loss=0.06834, simple_loss=0.09631, pruned_loss=0.01475, audio_tagging_loss=0.00543, over 16132.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08941, pruned_loss=0.01179, audio_tagging_loss=0.00877, over 3045709.23 frames. ], batch size: 60, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:11:29,134 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 550700 2023-11-28 21:11:36,672 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.216e+01 8.932e+01 9.584e+01 1.005e+02 1.481e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-28 21:11:46,917 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3671386.6666666665, ans=0.1 2023-11-28 21:11:47,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=3671386.6666666665, ans=15.0 2023-11-28 21:12:06,286 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 9650, loss[loss=0.04549, simple_loss=0.0591, pruned_loss=0.006684, audio_tagging_loss=0.009261, over 15000.00 frames. ], tot_loss[loss=0.06506, simple_loss=0.08899, pruned_loss=0.01175, audio_tagging_loss=0.008816, over 3050395.65 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:12:31,733 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 550750 2023-11-28 21:12:45,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3671720.0, ans=0.1 2023-11-28 21:13:07,731 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 9700, loss[loss=0.07032, simple_loss=0.096, pruned_loss=0.01413, audio_tagging_loss=0.008195, over 15677.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.08954, pruned_loss=0.0121, audio_tagging_loss=0.008687, over 3049608.21 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:13:10,036 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.21 vs. limit=10.0 2023-11-28 21:13:18,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3671853.3333333335, ans=0.0 2023-11-28 21:13:31,950 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3671986.6666666665, ans=0.125 2023-11-28 21:13:32,971 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 550800 2023-11-28 21:13:38,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3671986.6666666665, ans=0.2 2023-11-28 21:13:40,233 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.674e+01 9.004e+01 9.586e+01 1.046e+02 1.916e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-28 21:13:57,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3672120.0, ans=0.125 2023-11-28 21:14:02,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3672120.0, ans=0.125 2023-11-28 21:14:08,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3672120.0, ans=0.125 2023-11-28 21:14:10,532 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 9750, loss[loss=0.06655, simple_loss=0.09314, pruned_loss=0.0126, audio_tagging_loss=0.00738, over 15665.00 frames. ], tot_loss[loss=0.06496, simple_loss=0.08914, pruned_loss=0.01185, audio_tagging_loss=0.008534, over 3044888.61 frames. ], batch size: 60, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:14:21,730 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3672253.3333333335, ans=0.125 2023-11-28 21:14:24,595 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.24 vs. limit=22.5 2023-11-28 21:14:25,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3672253.3333333335, ans=0.125 2023-11-28 21:14:25,768 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.38 vs. limit=10.0 2023-11-28 21:14:35,360 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 550850 2023-11-28 21:14:38,324 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.66 vs. limit=12.0 2023-11-28 21:14:41,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3672320.0, ans=0.0 2023-11-28 21:15:06,966 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.30 vs. limit=15.0 2023-11-28 21:15:11,882 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 9800, loss[loss=0.06835, simple_loss=0.09386, pruned_loss=0.01169, audio_tagging_loss=0.009726, over 14788.00 frames. ], tot_loss[loss=0.06479, simple_loss=0.08874, pruned_loss=0.01184, audio_tagging_loss=0.008578, over 3041804.28 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:15:36,359 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 550900 2023-11-28 21:15:37,635 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3672653.3333333335, ans=0.0 2023-11-28 21:15:43,930 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.245e+01 8.839e+01 9.561e+01 1.020e+02 1.364e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-28 21:15:51,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3672720.0, ans=0.0 2023-11-28 21:16:07,140 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 21:16:12,260 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3672853.3333333335, ans=0.0 2023-11-28 21:16:12,991 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 9850, loss[loss=0.06902, simple_loss=0.1027, pruned_loss=0.01042, audio_tagging_loss=0.007267, over 15579.00 frames. ], tot_loss[loss=0.06474, simple_loss=0.08905, pruned_loss=0.01176, audio_tagging_loss=0.008454, over 3047711.05 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:16:33,213 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3672920.0, ans=0.125 2023-11-28 21:16:38,347 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 550950 2023-11-28 21:16:53,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3673053.3333333335, ans=0.2 2023-11-28 21:17:13,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3673186.6666666665, ans=0.0 2023-11-28 21:17:14,276 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 9900, loss[loss=0.06255, simple_loss=0.09291, pruned_loss=0.01238, audio_tagging_loss=0.003715, over 14664.00 frames. ], tot_loss[loss=0.06465, simple_loss=0.08896, pruned_loss=0.01171, audio_tagging_loss=0.008457, over 3049168.81 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 8.0 2023-11-28 21:17:34,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3673253.3333333335, ans=0.125 2023-11-28 21:17:39,974 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 551000 2023-11-28 21:17:48,303 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.533e+01 8.969e+01 9.517e+01 1.006e+02 1.259e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-28 21:18:03,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3673453.3333333335, ans=0.0 2023-11-28 21:18:16,954 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 9950, loss[loss=0.06894, simple_loss=0.08821, pruned_loss=0.01445, audio_tagging_loss=0.01038, over 14734.00 frames. ], tot_loss[loss=0.06427, simple_loss=0.08851, pruned_loss=0.01156, audio_tagging_loss=0.008446, over 3047769.88 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 8.0 2023-11-28 21:18:23,646 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3673520.0, ans=0.125 2023-11-28 21:18:23,908 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.35 vs. limit=15.0 2023-11-28 21:18:41,977 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 551050 2023-11-28 21:19:18,505 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 10000, loss[loss=0.07488, simple_loss=0.1005, pruned_loss=0.01676, audio_tagging_loss=0.007863, over 14992.00 frames. ], tot_loss[loss=0.06379, simple_loss=0.08755, pruned_loss=0.0115, audio_tagging_loss=0.008511, over 3045717.35 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:19:21,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3673853.3333333335, ans=0.125 2023-11-28 21:19:34,473 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3673920.0, ans=0.1 2023-11-28 21:19:43,029 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 551100 2023-11-28 21:19:47,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3673986.6666666665, ans=0.125 2023-11-28 21:19:49,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3673986.6666666665, ans=0.1 2023-11-28 21:19:51,731 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.896e+01 9.118e+01 9.727e+01 1.019e+02 1.264e+02, threshold=1.945e+02, percent-clipped=0.0 2023-11-28 21:19:57,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3674053.3333333335, ans=0.0 2023-11-28 21:20:05,853 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3674053.3333333335, ans=0.125 2023-11-28 21:20:14,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3674120.0, ans=0.2 2023-11-28 21:20:20,092 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 10050, loss[loss=0.06975, simple_loss=0.09463, pruned_loss=0.01321, audio_tagging_loss=0.009229, over 15610.00 frames. ], tot_loss[loss=0.06399, simple_loss=0.08785, pruned_loss=0.01149, audio_tagging_loss=0.008571, over 3046557.53 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:20:25,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3674186.6666666665, ans=0.2 2023-11-28 21:20:41,099 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 21:20:46,150 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 551150 2023-11-28 21:20:52,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3674320.0, ans=0.0 2023-11-28 21:21:08,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3674453.3333333335, ans=0.2 2023-11-28 21:21:21,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3674520.0, ans=0.025 2023-11-28 21:21:22,607 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 10100, loss[loss=0.0749, simple_loss=0.1019, pruned_loss=0.01588, audio_tagging_loss=0.008052, over 14878.00 frames. ], tot_loss[loss=0.06383, simple_loss=0.08748, pruned_loss=0.01147, audio_tagging_loss=0.008617, over 3051615.54 frames. ], batch size: 54, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:21:29,693 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3674520.0, ans=0.1 2023-11-28 21:21:42,828 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3674586.6666666665, ans=0.125 2023-11-28 21:21:46,939 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 551200 2023-11-28 21:21:55,383 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.369e+01 8.986e+01 9.697e+01 1.060e+02 1.407e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-28 21:22:03,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3674720.0, ans=0.0 2023-11-28 21:22:13,489 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 21:22:24,685 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 10150, loss[loss=0.04578, simple_loss=0.05134, pruned_loss=0.006804, audio_tagging_loss=0.01331, over 16348.00 frames. ], tot_loss[loss=0.06345, simple_loss=0.08673, pruned_loss=0.01139, audio_tagging_loss=0.0087, over 3046906.03 frames. ], batch size: 65, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:22:27,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3674853.3333333335, ans=0.0 2023-11-28 21:22:27,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3674853.3333333335, ans=0.125 2023-11-28 21:22:29,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3674853.3333333335, ans=0.0 2023-11-28 21:22:32,915 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.66 vs. limit=15.0 2023-11-28 21:22:49,327 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 551250 2023-11-28 21:22:49,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3674986.6666666665, ans=0.0 2023-11-28 21:22:50,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3674986.6666666665, ans=0.125 2023-11-28 21:22:54,667 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 21:22:57,477 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.36 vs. limit=15.0 2023-11-28 21:23:18,658 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.65 vs. limit=15.0 2023-11-28 21:23:26,849 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 10200, loss[loss=0.06535, simple_loss=0.0872, pruned_loss=0.01247, audio_tagging_loss=0.009278, over 15863.00 frames. ], tot_loss[loss=0.064, simple_loss=0.08748, pruned_loss=0.0116, audio_tagging_loss=0.008663, over 3045302.78 frames. ], batch size: 59, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:23:37,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3675253.3333333335, ans=0.125 2023-11-28 21:23:44,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3675253.3333333335, ans=0.035 2023-11-28 21:23:45,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3675253.3333333335, ans=0.0 2023-11-28 21:23:50,158 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 21:23:51,346 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 551300 2023-11-28 21:23:55,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3675320.0, ans=0.0 2023-11-28 21:24:00,150 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.427e+01 8.789e+01 9.521e+01 1.014e+02 1.270e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-28 21:24:12,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3675386.6666666665, ans=0.0 2023-11-28 21:24:13,267 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.98 vs. limit=15.0 2023-11-28 21:24:28,432 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 10250, loss[loss=0.05714, simple_loss=0.0772, pruned_loss=0.009582, audio_tagging_loss=0.008961, over 14389.00 frames. ], tot_loss[loss=0.06421, simple_loss=0.08761, pruned_loss=0.01168, audio_tagging_loss=0.008722, over 3042698.31 frames. ], batch size: 53, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:24:38,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3675520.0, ans=0.1 2023-11-28 21:24:53,124 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 551350 2023-11-28 21:24:53,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3675653.3333333335, ans=0.125 2023-11-28 21:24:58,722 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.71 vs. limit=22.5 2023-11-28 21:24:59,897 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.89 vs. limit=22.5 2023-11-28 21:25:30,792 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 10300, loss[loss=0.07448, simple_loss=0.1061, pruned_loss=0.016, audio_tagging_loss=0.005417, over 14874.00 frames. ], tot_loss[loss=0.06443, simple_loss=0.08793, pruned_loss=0.01165, audio_tagging_loss=0.00881, over 3042181.82 frames. ], batch size: 53, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:25:36,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_na.min_abs, batch_count=3675853.3333333335, ans=0.02 2023-11-28 21:25:54,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3675986.6666666665, ans=0.2 2023-11-28 21:25:54,337 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.80 vs. limit=15.0 2023-11-28 21:25:55,039 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 551400 2023-11-28 21:25:56,600 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.84 vs. limit=15.0 2023-11-28 21:26:00,059 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.41 vs. limit=15.0 2023-11-28 21:26:04,016 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.214e+01 9.006e+01 9.446e+01 1.010e+02 1.376e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-28 21:26:09,917 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3676053.3333333335, ans=0.125 2023-11-28 21:26:32,632 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 10350, loss[loss=0.07271, simple_loss=0.1058, pruned_loss=0.01323, audio_tagging_loss=0.006598, over 15447.00 frames. ], tot_loss[loss=0.06481, simple_loss=0.08822, pruned_loss=0.01183, audio_tagging_loss=0.008873, over 3038415.16 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:26:43,520 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3676253.3333333335, ans=0.0 2023-11-28 21:26:56,411 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 551450 2023-11-28 21:27:21,615 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.06 vs. limit=15.0 2023-11-28 21:27:33,584 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 10400, loss[loss=0.06466, simple_loss=0.09679, pruned_loss=0.008957, audio_tagging_loss=0.007311, over 14627.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08911, pruned_loss=0.01201, audio_tagging_loss=0.008867, over 3042009.85 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:27:58,430 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 551500 2023-11-28 21:27:58,538 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3676653.3333333335, ans=0.125 2023-11-28 21:28:07,105 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.505e+01 8.953e+01 9.462e+01 1.003e+02 1.279e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-28 21:28:35,120 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 10450, loss[loss=0.05608, simple_loss=0.08447, pruned_loss=0.008576, audio_tagging_loss=0.005274, over 14028.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.08964, pruned_loss=0.01214, audio_tagging_loss=0.008762, over 3038751.92 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:28:46,868 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3676920.0, ans=0.125 2023-11-28 21:29:00,193 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 551550 2023-11-28 21:29:02,879 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.78 vs. limit=15.0 2023-11-28 21:29:03,768 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3676986.6666666665, ans=0.125 2023-11-28 21:29:08,036 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3676986.6666666665, ans=0.1 2023-11-28 21:29:10,180 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 21:29:34,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3677120.0, ans=0.1 2023-11-28 21:29:37,899 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 10500, loss[loss=0.05799, simple_loss=0.07566, pruned_loss=0.01061, audio_tagging_loss=0.009545, over 15310.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.08929, pruned_loss=0.01204, audio_tagging_loss=0.008647, over 3039689.40 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:29:38,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=3677186.6666666665, ans=6.0 2023-11-28 21:29:40,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3677186.6666666665, ans=0.0 2023-11-28 21:29:42,953 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3677186.6666666665, ans=0.125 2023-11-28 21:29:47,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3677186.6666666665, ans=0.0 2023-11-28 21:30:02,187 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 551600 2023-11-28 21:30:11,080 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.383e+01 8.780e+01 9.536e+01 1.016e+02 1.256e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-28 21:30:39,057 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 10550, loss[loss=0.06131, simple_loss=0.07852, pruned_loss=0.01324, audio_tagging_loss=0.008815, over 14361.00 frames. ], tot_loss[loss=0.06488, simple_loss=0.08857, pruned_loss=0.012, audio_tagging_loss=0.008599, over 3035549.68 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:31:02,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3677653.3333333335, ans=0.95 2023-11-28 21:31:04,493 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 551650 2023-11-28 21:31:04,749 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3677653.3333333335, ans=0.125 2023-11-28 21:31:30,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3677786.6666666665, ans=0.125 2023-11-28 21:31:40,760 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 10600, loss[loss=0.04543, simple_loss=0.06242, pruned_loss=0.005861, audio_tagging_loss=0.008357, over 13772.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08931, pruned_loss=0.01222, audio_tagging_loss=0.008473, over 3034752.68 frames. ], batch size: 54, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:32:06,020 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 551700 2023-11-28 21:32:14,116 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.474e+01 9.049e+01 9.693e+01 1.043e+02 1.339e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-28 21:32:14,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3677986.6666666665, ans=0.125 2023-11-28 21:32:18,863 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3678053.3333333335, ans=0.125 2023-11-28 21:32:26,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3678053.3333333335, ans=0.125 2023-11-28 21:32:36,447 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.79 vs. limit=6.0 2023-11-28 21:32:43,470 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 10650, loss[loss=0.0481, simple_loss=0.06861, pruned_loss=0.004708, audio_tagging_loss=0.009084, over 15102.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.0891, pruned_loss=0.01222, audio_tagging_loss=0.008489, over 3035328.23 frames. ], batch size: 59, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:32:50,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3678186.6666666665, ans=0.125 2023-11-28 21:33:05,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3678253.3333333335, ans=0.2 2023-11-28 21:33:08,171 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 551750 2023-11-28 21:33:09,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3678320.0, ans=0.0 2023-11-28 21:33:24,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3678386.6666666665, ans=0.2 2023-11-28 21:33:29,276 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.80 vs. limit=12.0 2023-11-28 21:33:45,308 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 10700, loss[loss=0.06201, simple_loss=0.07448, pruned_loss=0.01426, audio_tagging_loss=0.01051, over 14458.00 frames. ], tot_loss[loss=0.06501, simple_loss=0.08856, pruned_loss=0.01211, audio_tagging_loss=0.008613, over 3036718.67 frames. ], batch size: 59, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:33:53,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3678520.0, ans=0.0 2023-11-28 21:34:01,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3678586.6666666665, ans=0.125 2023-11-28 21:34:10,576 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 551800 2023-11-28 21:34:13,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3678653.3333333335, ans=0.125 2023-11-28 21:34:19,741 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.713e+01 9.110e+01 9.632e+01 1.031e+02 2.472e+02, threshold=1.926e+02, percent-clipped=1.0 2023-11-28 21:34:25,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=3678720.0, ans=15.0 2023-11-28 21:34:37,433 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3678786.6666666665, ans=0.125 2023-11-28 21:34:38,338 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.67 vs. limit=15.0 2023-11-28 21:34:48,470 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 10750, loss[loss=0.0715, simple_loss=0.09993, pruned_loss=0.01376, audio_tagging_loss=0.00777, over 14911.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08919, pruned_loss=0.0122, audio_tagging_loss=0.008545, over 3039233.36 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:34:59,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3678920.0, ans=0.2 2023-11-28 21:35:08,525 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3678920.0, ans=0.0 2023-11-28 21:35:11,710 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3678920.0, ans=0.1 2023-11-28 21:35:13,730 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 551850 2023-11-28 21:35:24,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3679053.3333333335, ans=0.0 2023-11-28 21:35:27,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3679053.3333333335, ans=0.2 2023-11-28 21:35:31,910 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3679053.3333333335, ans=0.125 2023-11-28 21:35:37,915 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3679120.0, ans=0.1 2023-11-28 21:35:43,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3679120.0, ans=0.2 2023-11-28 21:35:48,035 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.49 vs. limit=15.0 2023-11-28 21:35:49,811 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 10800, loss[loss=0.06145, simple_loss=0.09606, pruned_loss=0.007913, audio_tagging_loss=0.005511, over 15544.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08962, pruned_loss=0.01219, audio_tagging_loss=0.008466, over 3036545.85 frames. ], batch size: 59, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:36:02,272 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.19 vs. limit=15.0 2023-11-28 21:36:15,203 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 551900 2023-11-28 21:36:24,357 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.667e+01 8.859e+01 9.432e+01 1.041e+02 1.593e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-28 21:36:33,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=3679386.6666666665, ans=10.0 2023-11-28 21:36:37,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3679386.6666666665, ans=0.0 2023-11-28 21:36:51,869 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 10850, loss[loss=0.06776, simple_loss=0.1003, pruned_loss=0.01284, audio_tagging_loss=0.004765, over 15077.00 frames. ], tot_loss[loss=0.06505, simple_loss=0.08862, pruned_loss=0.01219, audio_tagging_loss=0.008551, over 3044103.65 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:36:59,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3679520.0, ans=0.07 2023-11-28 21:37:16,582 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 551950 2023-11-28 21:37:20,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3679653.3333333335, ans=0.125 2023-11-28 21:37:29,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3679720.0, ans=0.2 2023-11-28 21:37:37,281 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.47 vs. limit=15.0 2023-11-28 21:37:40,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3679786.6666666665, ans=0.1 2023-11-28 21:37:50,047 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 21:37:53,416 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 10900, loss[loss=0.07389, simple_loss=0.09777, pruned_loss=0.01557, audio_tagging_loss=0.009433, over 15067.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08872, pruned_loss=0.01228, audio_tagging_loss=0.008694, over 3050067.41 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:38:08,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3679920.0, ans=0.1 2023-11-28 21:38:18,010 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 552000 2023-11-28 21:38:19,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3679986.6666666665, ans=0.125 2023-11-28 21:38:25,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3679986.6666666665, ans=0.125 2023-11-28 21:38:26,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3679986.6666666665, ans=0.0 2023-11-28 21:38:30,862 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.172e+01 9.027e+01 9.612e+01 1.023e+02 1.534e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-28 21:38:34,124 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3680053.3333333335, ans=0.1 2023-11-28 21:38:35,235 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3680053.3333333335, ans=0.05 2023-11-28 21:38:38,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3680053.3333333335, ans=0.125 2023-11-28 21:38:43,550 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 21:38:45,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3680120.0, ans=0.1 2023-11-28 21:38:47,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3680120.0, ans=0.125 2023-11-28 21:38:49,548 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3680120.0, ans=0.1 2023-11-28 21:38:54,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3680120.0, ans=0.125 2023-11-28 21:38:57,988 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 10950, loss[loss=0.04379, simple_loss=0.04732, pruned_loss=0.008803, audio_tagging_loss=0.01133, over 14087.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.0894, pruned_loss=0.01239, audio_tagging_loss=0.008625, over 3049569.47 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:39:01,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3680186.6666666665, ans=0.1 2023-11-28 21:39:06,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3680186.6666666665, ans=0.125 2023-11-28 21:39:14,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3680253.3333333335, ans=0.0 2023-11-28 21:39:21,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3680253.3333333335, ans=0.09899494936611666 2023-11-28 21:39:23,183 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 552050 2023-11-28 21:39:25,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3680320.0, ans=0.125 2023-11-28 21:39:30,801 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.78 vs. limit=15.0 2023-11-28 21:39:33,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3680386.6666666665, ans=0.0 2023-11-28 21:39:33,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3680386.6666666665, ans=0.0 2023-11-28 21:39:33,988 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3680386.6666666665, ans=0.1 2023-11-28 21:39:40,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3680386.6666666665, ans=0.0 2023-11-28 21:39:59,058 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 11000, loss[loss=0.07519, simple_loss=0.1, pruned_loss=0.01202, audio_tagging_loss=0.01317, over 15334.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.09069, pruned_loss=0.01244, audio_tagging_loss=0.00858, over 3056977.38 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:39:59,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3680520.0, ans=0.125 2023-11-28 21:40:09,745 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 21:40:13,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3680586.6666666665, ans=0.2 2023-11-28 21:40:14,536 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3680586.6666666665, ans=0.0 2023-11-28 21:40:23,649 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 552100 2023-11-28 21:40:26,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3680653.3333333335, ans=0.125 2023-11-28 21:40:31,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3680653.3333333335, ans=0.95 2023-11-28 21:40:32,756 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.93 vs. limit=15.0 2023-11-28 21:40:33,344 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.051e+01 8.817e+01 9.387e+01 9.947e+01 1.401e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-28 21:41:00,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3680853.3333333335, ans=0.125 2023-11-28 21:41:01,333 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 11050, loss[loss=0.05848, simple_loss=0.06998, pruned_loss=0.01133, audio_tagging_loss=0.01216, over 14761.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.09005, pruned_loss=0.01238, audio_tagging_loss=0.008758, over 3057063.17 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:41:16,383 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.08 vs. limit=15.0 2023-11-28 21:41:17,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3680920.0, ans=0.125 2023-11-28 21:41:21,931 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3680920.0, ans=0.0 2023-11-28 21:41:24,165 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3680986.6666666665, ans=0.125 2023-11-28 21:41:25,761 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 552150 2023-11-28 21:41:27,101 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3680986.6666666665, ans=0.1 2023-11-28 21:41:39,674 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3681053.3333333335, ans=0.125 2023-11-28 21:41:47,101 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3681053.3333333335, ans=0.0 2023-11-28 21:42:02,720 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 11100, loss[loss=0.05337, simple_loss=0.07065, pruned_loss=0.008385, audio_tagging_loss=0.009661, over 15703.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.08952, pruned_loss=0.01221, audio_tagging_loss=0.008843, over 3056599.90 frames. ], batch size: 62, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:42:06,662 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.75 vs. limit=22.5 2023-11-28 21:42:17,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3681253.3333333335, ans=0.0 2023-11-28 21:42:18,996 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.80 vs. limit=15.0 2023-11-28 21:42:27,664 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 552200 2023-11-28 21:42:37,750 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.621e+01 8.965e+01 9.771e+01 1.048e+02 1.332e+02, threshold=1.954e+02, percent-clipped=0.0 2023-11-28 21:42:39,218 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3681386.6666666665, ans=0.125 2023-11-28 21:42:50,427 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3681386.6666666665, ans=0.0 2023-11-28 21:42:55,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3681453.3333333335, ans=0.125 2023-11-28 21:43:04,746 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 11150, loss[loss=0.0657, simple_loss=0.08613, pruned_loss=0.01228, audio_tagging_loss=0.01035, over 15569.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.08988, pruned_loss=0.01229, audio_tagging_loss=0.008835, over 3055780.46 frames. ], batch size: 60, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:43:12,448 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.94 vs. limit=15.0 2023-11-28 21:43:18,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3681586.6666666665, ans=0.2 2023-11-28 21:43:24,241 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.43 vs. limit=22.5 2023-11-28 21:43:28,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3681653.3333333335, ans=0.125 2023-11-28 21:43:28,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3681653.3333333335, ans=0.0 2023-11-28 21:43:28,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3681653.3333333335, ans=0.04949747468305833 2023-11-28 21:43:29,355 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 552250 2023-11-28 21:43:44,101 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3681720.0, ans=0.125 2023-11-28 21:43:51,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3681720.0, ans=0.125 2023-11-28 21:43:53,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3681786.6666666665, ans=0.0 2023-11-28 21:44:06,234 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 11200, loss[loss=0.09277, simple_loss=0.1372, pruned_loss=0.01924, audio_tagging_loss=0.004943, over 15168.00 frames. ], tot_loss[loss=0.06683, simple_loss=0.09098, pruned_loss=0.01248, audio_tagging_loss=0.008853, over 3058284.49 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:44:24,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3681920.0, ans=0.125 2023-11-28 21:44:28,207 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3681920.0, ans=0.1 2023-11-28 21:44:30,338 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 552300 2023-11-28 21:44:30,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3681986.6666666665, ans=0.0 2023-11-28 21:44:41,367 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.336e+01 8.920e+01 9.511e+01 1.032e+02 1.205e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-28 21:44:44,370 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.41 vs. limit=15.0 2023-11-28 21:45:08,017 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 11250, loss[loss=0.05074, simple_loss=0.05684, pruned_loss=0.008259, audio_tagging_loss=0.01406, over 15702.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.09074, pruned_loss=0.01232, audio_tagging_loss=0.008815, over 3057700.29 frames. ], batch size: 62, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:45:32,210 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 552350 2023-11-28 21:45:45,738 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3682386.6666666665, ans=0.125 2023-11-28 21:45:45,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3682386.6666666665, ans=0.0 2023-11-28 21:45:47,299 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.85 vs. limit=15.0 2023-11-28 21:45:52,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3682386.6666666665, ans=0.125 2023-11-28 21:45:58,128 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.77 vs. limit=22.5 2023-11-28 21:46:02,608 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.95 vs. limit=15.0 2023-11-28 21:46:09,208 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 11300, loss[loss=0.06697, simple_loss=0.1002, pruned_loss=0.01012, audio_tagging_loss=0.006766, over 16579.00 frames. ], tot_loss[loss=0.0664, simple_loss=0.0908, pruned_loss=0.01235, audio_tagging_loss=0.008651, over 3062088.08 frames. ], batch size: 59, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:46:12,019 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.66 vs. limit=15.0 2023-11-28 21:46:24,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3682586.6666666665, ans=0.1 2023-11-28 21:46:34,700 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 552400 2023-11-28 21:46:46,398 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.822e+01 8.990e+01 9.658e+01 1.057e+02 1.418e+02, threshold=1.932e+02, percent-clipped=0.0 2023-11-28 21:47:12,721 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 11350, loss[loss=0.05919, simple_loss=0.07531, pruned_loss=0.01261, audio_tagging_loss=0.008927, over 14975.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.09054, pruned_loss=0.01221, audio_tagging_loss=0.008493, over 3060327.25 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:47:15,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3682853.3333333335, ans=0.125 2023-11-28 21:47:26,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3682920.0, ans=0.125 2023-11-28 21:47:37,440 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 552450 2023-11-28 21:47:53,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3683053.3333333335, ans=0.125 2023-11-28 21:47:55,735 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3683053.3333333335, ans=0.125 2023-11-28 21:48:14,227 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 11400, loss[loss=0.07788, simple_loss=0.108, pruned_loss=0.01726, audio_tagging_loss=0.006646, over 15242.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.09079, pruned_loss=0.01233, audio_tagging_loss=0.008371, over 3050961.21 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:48:30,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3683253.3333333335, ans=0.0 2023-11-28 21:48:38,614 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 552500 2023-11-28 21:48:39,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3683320.0, ans=0.0 2023-11-28 21:48:49,679 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.359e+01 8.845e+01 9.711e+01 1.056e+02 1.187e+02, threshold=1.942e+02, percent-clipped=0.0 2023-11-28 21:48:59,950 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3683386.6666666665, ans=0.0 2023-11-28 21:49:02,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3683453.3333333335, ans=0.1 2023-11-28 21:49:05,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3683453.3333333335, ans=0.0 2023-11-28 21:49:16,157 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 11450, loss[loss=0.07991, simple_loss=0.1166, pruned_loss=0.0158, audio_tagging_loss=0.005819, over 14540.00 frames. ], tot_loss[loss=0.06602, simple_loss=0.09056, pruned_loss=0.01233, audio_tagging_loss=0.008406, over 3050193.16 frames. ], batch size: 52, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:49:20,160 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.08 vs. limit=15.0 2023-11-28 21:49:21,625 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.52 vs. limit=15.0 2023-11-28 21:49:28,457 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.42 vs. limit=22.5 2023-11-28 21:49:39,228 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3683653.3333333335, ans=0.125 2023-11-28 21:49:39,836 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2023-11-28 21:49:40,260 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 552550 2023-11-28 21:49:54,476 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3683720.0, ans=0.035 2023-11-28 21:50:16,996 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 11500, loss[loss=0.05642, simple_loss=0.07831, pruned_loss=0.008977, audio_tagging_loss=0.00829, over 15788.00 frames. ], tot_loss[loss=0.06538, simple_loss=0.08971, pruned_loss=0.01209, audio_tagging_loss=0.008433, over 3054112.38 frames. ], batch size: 60, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:50:25,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3683853.3333333335, ans=0.0 2023-11-28 21:50:29,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3683920.0, ans=0.0 2023-11-28 21:50:30,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3683920.0, ans=0.125 2023-11-28 21:50:42,305 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 552600 2023-11-28 21:50:53,405 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.968e+01 8.799e+01 9.432e+01 1.014e+02 1.264e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-28 21:50:56,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3684053.3333333335, ans=0.1 2023-11-28 21:51:11,758 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3684120.0, ans=0.2 2023-11-28 21:51:11,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3684120.0, ans=0.0 2023-11-28 21:51:18,539 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 11550, loss[loss=0.06994, simple_loss=0.1047, pruned_loss=0.01258, audio_tagging_loss=0.005014, over 16056.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.08983, pruned_loss=0.01216, audio_tagging_loss=0.008447, over 3048590.63 frames. ], batch size: 60, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:51:44,038 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 552650 2023-11-28 21:51:57,615 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 21:52:05,646 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3684386.6666666665, ans=0.125 2023-11-28 21:52:20,653 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 11600, loss[loss=0.07984, simple_loss=0.1031, pruned_loss=0.01803, audio_tagging_loss=0.01024, over 15383.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.09083, pruned_loss=0.01224, audio_tagging_loss=0.008422, over 3048381.68 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:52:27,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3684520.0, ans=0.1 2023-11-28 21:52:45,082 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 552700 2023-11-28 21:52:46,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3684653.3333333335, ans=0.0 2023-11-28 21:52:55,510 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.532e+01 9.024e+01 9.602e+01 1.030e+02 1.712e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-28 21:52:57,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3684720.0, ans=0.1 2023-11-28 21:53:08,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3684786.6666666665, ans=0.0 2023-11-28 21:53:11,433 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.43 vs. limit=15.0 2023-11-28 21:53:13,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3684786.6666666665, ans=0.0 2023-11-28 21:53:13,753 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.71 vs. limit=15.0 2023-11-28 21:53:15,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3684786.6666666665, ans=0.0 2023-11-28 21:53:21,380 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 11650, loss[loss=0.05397, simple_loss=0.07494, pruned_loss=0.006215, audio_tagging_loss=0.01028, over 16444.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.09029, pruned_loss=0.01222, audio_tagging_loss=0.008381, over 3042582.10 frames. ], batch size: 62, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:53:25,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3684853.3333333335, ans=0.0 2023-11-28 21:53:37,494 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3684920.0, ans=0.05 2023-11-28 21:53:46,733 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 552750 2023-11-28 21:53:54,865 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3684986.6666666665, ans=0.035 2023-11-28 21:53:55,090 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3684986.6666666665, ans=0.125 2023-11-28 21:54:10,786 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 21:54:17,237 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3685120.0, ans=0.125 2023-11-28 21:54:17,263 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3685120.0, ans=0.0 2023-11-28 21:54:18,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3685120.0, ans=0.125 2023-11-28 21:54:21,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3685186.6666666665, ans=0.07 2023-11-28 21:54:22,847 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 11700, loss[loss=0.06991, simple_loss=0.09508, pruned_loss=0.01468, audio_tagging_loss=0.007692, over 14466.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.09007, pruned_loss=0.01222, audio_tagging_loss=0.008428, over 3043160.74 frames. ], batch size: 53, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:54:34,579 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3685253.3333333335, ans=0.1 2023-11-28 21:54:48,090 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 552800 2023-11-28 21:54:49,373 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3685320.0, ans=0.125 2023-11-28 21:54:52,041 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3685320.0, ans=0.2 2023-11-28 21:54:59,272 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.452e+01 9.206e+01 9.735e+01 1.055e+02 1.331e+02, threshold=1.947e+02, percent-clipped=0.0 2023-11-28 21:55:00,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3685386.6666666665, ans=0.125 2023-11-28 21:55:02,036 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3685386.6666666665, ans=0.0 2023-11-28 21:55:08,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3685386.6666666665, ans=10.0 2023-11-28 21:55:10,205 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.93 vs. limit=12.0 2023-11-28 21:55:13,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3685453.3333333335, ans=0.0 2023-11-28 21:55:17,775 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3685453.3333333335, ans=0.2 2023-11-28 21:55:24,989 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 11750, loss[loss=0.0566, simple_loss=0.06685, pruned_loss=0.009722, audio_tagging_loss=0.01346, over 14327.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.08909, pruned_loss=0.01211, audio_tagging_loss=0.008555, over 3038061.56 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:55:25,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3685520.0, ans=0.125 2023-11-28 21:55:35,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3685520.0, ans=0.0 2023-11-28 21:55:49,521 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 552850 2023-11-28 21:55:49,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3685653.3333333335, ans=0.125 2023-11-28 21:55:59,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3685653.3333333335, ans=0.125 2023-11-28 21:56:21,671 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3685786.6666666665, ans=0.125 2023-11-28 21:56:26,109 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 11800, loss[loss=0.05092, simple_loss=0.05974, pruned_loss=0.009533, audio_tagging_loss=0.01152, over 15522.00 frames. ], tot_loss[loss=0.06501, simple_loss=0.08874, pruned_loss=0.01203, audio_tagging_loss=0.008606, over 3038823.69 frames. ], batch size: 60, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:56:34,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3685853.3333333335, ans=0.125 2023-11-28 21:56:36,293 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 21:56:51,010 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 552900 2023-11-28 21:56:59,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3685986.6666666665, ans=0.0 2023-11-28 21:57:02,189 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.743e+01 8.813e+01 9.510e+01 1.037e+02 1.447e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-28 21:57:05,831 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.09 vs. limit=15.0 2023-11-28 21:57:06,520 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3686053.3333333335, ans=0.125 2023-11-28 21:57:06,555 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3686053.3333333335, ans=0.0 2023-11-28 21:57:08,024 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.97 vs. limit=15.0 2023-11-28 21:57:11,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3686053.3333333335, ans=0.0 2023-11-28 21:57:23,598 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3686120.0, ans=0.125 2023-11-28 21:57:26,669 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.83 vs. limit=15.0 2023-11-28 21:57:28,219 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 11850, loss[loss=0.06624, simple_loss=0.08689, pruned_loss=0.01507, audio_tagging_loss=0.00773, over 14467.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.0894, pruned_loss=0.01202, audio_tagging_loss=0.00864, over 3042128.33 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:57:31,534 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.21 vs. limit=5.0 2023-11-28 21:57:51,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3686253.3333333335, ans=0.2 2023-11-28 21:57:53,515 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 552950 2023-11-28 21:57:56,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3686320.0, ans=0.0 2023-11-28 21:57:59,812 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.86 vs. limit=15.0 2023-11-28 21:58:02,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3686320.0, ans=0.0 2023-11-28 21:58:11,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3686386.6666666665, ans=0.1 2023-11-28 21:58:29,196 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 11900, loss[loss=0.05158, simple_loss=0.06685, pruned_loss=0.007891, audio_tagging_loss=0.01027, over 14293.00 frames. ], tot_loss[loss=0.06479, simple_loss=0.08823, pruned_loss=0.01186, audio_tagging_loss=0.008815, over 3043336.82 frames. ], batch size: 54, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:58:53,929 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=5.82 vs. limit=15.0 2023-11-28 21:58:54,597 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 553000 2023-11-28 21:59:05,470 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.558e+01 8.697e+01 9.440e+01 1.029e+02 1.196e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-28 21:59:23,595 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3686786.6666666665, ans=0.0 2023-11-28 21:59:32,217 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 11950, loss[loss=0.07533, simple_loss=0.1083, pruned_loss=0.01311, audio_tagging_loss=0.008082, over 15317.00 frames. ], tot_loss[loss=0.06438, simple_loss=0.08747, pruned_loss=0.01171, audio_tagging_loss=0.008938, over 3048010.21 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:59:42,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3686853.3333333335, ans=0.0 2023-11-28 21:59:54,029 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-28 21:59:56,213 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 553050 2023-11-28 22:00:03,178 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.62 vs. limit=15.0 2023-11-28 22:00:07,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3687053.3333333335, ans=0.125 2023-11-28 22:00:11,033 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.68 vs. limit=15.0 2023-11-28 22:00:31,707 INFO [train_asr.py:1235] (1/4) Epoch 46, batch 12000, loss[loss=0.07295, simple_loss=0.1071, pruned_loss=0.01295, audio_tagging_loss=0.006428, over 14890.00 frames. ], tot_loss[loss=0.06489, simple_loss=0.08834, pruned_loss=0.0118, audio_tagging_loss=0.008925, over 3045096.39 frames. ], batch size: 54, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 22:00:31,709 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-28 22:01:00,072 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.4783, 3.5286, 3.6625, 3.7014], device='cuda:1') 2023-11-28 22:01:11,992 INFO [train_asr.py:1267] (1/4) Epoch 46, validation: loss=0.05835, simple_loss=0.05054, pruned_loss=0.005304, audio_tagging_loss=0.02778, over 4681554.00 frames. 2023-11-28 22:01:11,993 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-28 22:01:17,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3687186.6666666665, ans=0.1 2023-11-28 22:01:33,106 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.53 vs. limit=10.0 2023-11-28 22:01:34,493 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 553100 2023-11-28 22:01:56,266 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 0, loss[loss=0.07612, simple_loss=0.09811, pruned_loss=0.008374, audio_tagging_loss=0.01869, over 15877.00 frames. ], tot_loss[loss=0.07612, simple_loss=0.09811, pruned_loss=0.008374, audio_tagging_loss=0.01869, over 15877.00 frames. ], batch size: 60, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:01:56,267 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-28 22:02:32,341 INFO [train_asr.py:1267] (1/4) Epoch 47, validation: loss=0.05784, simple_loss=0.05051, pruned_loss=0.005299, audio_tagging_loss=0.02728, over 4681554.00 frames. 2023-11-28 22:02:32,342 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-28 22:02:39,328 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.168e+01 9.135e+01 9.831e+01 1.074e+02 1.367e+02, threshold=1.966e+02, percent-clipped=0.0 2023-11-28 22:02:40,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3687340.0, ans=0.1 2023-11-28 22:02:59,554 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3687473.3333333335, ans=0.125 2023-11-28 22:02:59,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3687473.3333333335, ans=0.2 2023-11-28 22:03:24,267 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.28 vs. limit=22.5 2023-11-28 22:03:30,917 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 553150 2023-11-28 22:03:34,264 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 50, loss[loss=0.0798, simple_loss=0.09759, pruned_loss=0.01519, audio_tagging_loss=0.01582, over 14801.00 frames. ], tot_loss[loss=0.07428, simple_loss=0.09097, pruned_loss=0.0122, audio_tagging_loss=0.01659, over 690122.51 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:03:42,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3687673.3333333335, ans=0.125 2023-11-28 22:03:58,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3687806.6666666665, ans=0.125 2023-11-28 22:03:59,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3687806.6666666665, ans=0.1 2023-11-28 22:04:33,359 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 553200 2023-11-28 22:04:37,290 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 100, loss[loss=0.08883, simple_loss=0.1165, pruned_loss=0.01662, audio_tagging_loss=0.01393, over 14714.00 frames. ], tot_loss[loss=0.07402, simple_loss=0.0918, pruned_loss=0.01235, audio_tagging_loss=0.01577, over 1217287.00 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:04:44,876 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.649e+01 9.823e+01 1.051e+02 1.142e+02 1.295e+02, threshold=2.102e+02, percent-clipped=0.0 2023-11-28 22:04:53,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3688073.3333333335, ans=0.2 2023-11-28 22:04:56,695 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3688073.3333333335, ans=0.125 2023-11-28 22:04:58,536 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3688073.3333333335, ans=0.125 2023-11-28 22:05:02,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3688140.0, ans=0.09899494936611666 2023-11-28 22:05:05,910 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3688140.0, ans=0.125 2023-11-28 22:05:09,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3688140.0, ans=0.0 2023-11-28 22:05:10,522 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3688140.0, ans=0.2 2023-11-28 22:05:11,788 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3688140.0, ans=0.0 2023-11-28 22:05:25,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3688206.6666666665, ans=0.1 2023-11-28 22:05:34,259 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3688273.3333333335, ans=0.04949747468305833 2023-11-28 22:05:36,210 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 553250 2023-11-28 22:05:40,273 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 150, loss[loss=0.06704, simple_loss=0.08996, pruned_loss=0.01189, audio_tagging_loss=0.01017, over 14945.00 frames. ], tot_loss[loss=0.07312, simple_loss=0.09347, pruned_loss=0.01258, audio_tagging_loss=0.01381, over 1624146.54 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:05:54,922 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.39 vs. limit=15.0 2023-11-28 22:06:00,376 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff2.min_abs, batch_count=3688406.6666666665, ans=0.1 2023-11-28 22:06:02,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3688406.6666666665, ans=0.2 2023-11-28 22:06:05,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3688473.3333333335, ans=0.07 2023-11-28 22:06:39,513 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 553300 2023-11-28 22:06:42,874 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 200, loss[loss=0.08322, simple_loss=0.1214, pruned_loss=0.01224, audio_tagging_loss=0.01028, over 16144.00 frames. ], tot_loss[loss=0.0719, simple_loss=0.09424, pruned_loss=0.01258, audio_tagging_loss=0.0122, over 1948740.64 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:06:49,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3688673.3333333335, ans=0.125 2023-11-28 22:06:51,839 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.129e+01 9.056e+01 9.738e+01 1.064e+02 1.248e+02, threshold=1.948e+02, percent-clipped=0.0 2023-11-28 22:06:54,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3688740.0, ans=0.1 2023-11-28 22:07:11,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3688806.6666666665, ans=0.125 2023-11-28 22:07:24,241 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3688873.3333333335, ans=0.05 2023-11-28 22:07:30,018 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3688873.3333333335, ans=0.125 2023-11-28 22:07:37,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3688940.0, ans=0.0 2023-11-28 22:07:41,030 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 553350 2023-11-28 22:07:44,532 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 250, loss[loss=0.09486, simple_loss=0.1336, pruned_loss=0.0231, audio_tagging_loss=0.004983, over 16019.00 frames. ], tot_loss[loss=0.07109, simple_loss=0.09394, pruned_loss=0.01295, audio_tagging_loss=0.01117, over 2193578.02 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:07:47,542 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.04 vs. limit=15.0 2023-11-28 22:07:54,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3689006.6666666665, ans=0.2 2023-11-28 22:08:02,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3689073.3333333335, ans=0.125 2023-11-28 22:08:10,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3689140.0, ans=0.125 2023-11-28 22:08:13,559 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3689140.0, ans=0.125 2023-11-28 22:08:30,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3689206.6666666665, ans=0.0 2023-11-28 22:08:38,592 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3689273.3333333335, ans=0.0 2023-11-28 22:08:42,602 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 553400 2023-11-28 22:08:46,415 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 300, loss[loss=0.07824, simple_loss=0.1084, pruned_loss=0.01584, audio_tagging_loss=0.008201, over 15619.00 frames. ], tot_loss[loss=0.0698, simple_loss=0.09317, pruned_loss=0.0128, audio_tagging_loss=0.01042, over 2376160.74 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:08:50,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3689340.0, ans=0.125 2023-11-28 22:08:55,134 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.602e+01 9.275e+01 9.937e+01 1.062e+02 1.967e+02, threshold=1.987e+02, percent-clipped=1.0 2023-11-28 22:09:03,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3689406.6666666665, ans=0.09899494936611666 2023-11-28 22:09:14,114 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3689473.3333333335, ans=0.125 2023-11-28 22:09:20,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3689473.3333333335, ans=0.125 2023-11-28 22:09:41,930 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3689606.6666666665, ans=0.125 2023-11-28 22:09:44,105 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 553450 2023-11-28 22:09:48,036 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 350, loss[loss=0.08793, simple_loss=0.1252, pruned_loss=0.01611, audio_tagging_loss=0.009236, over 16174.00 frames. ], tot_loss[loss=0.06887, simple_loss=0.09273, pruned_loss=0.0126, audio_tagging_loss=0.009895, over 2531971.07 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:10:23,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3689873.3333333335, ans=0.1 2023-11-28 22:10:31,437 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3689873.3333333335, ans=0.2 2023-11-28 22:10:44,899 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 553500 2023-11-28 22:10:48,601 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 400, loss[loss=0.05572, simple_loss=0.07591, pruned_loss=0.007981, audio_tagging_loss=0.009784, over 14026.00 frames. ], tot_loss[loss=0.06711, simple_loss=0.09089, pruned_loss=0.01216, audio_tagging_loss=0.009505, over 2641265.94 frames. ], batch size: 54, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:10:56,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3690006.6666666665, ans=0.125 2023-11-28 22:10:56,882 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.564e+01 9.027e+01 9.535e+01 1.022e+02 1.341e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-28 22:11:02,955 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.44 vs. limit=15.0 2023-11-28 22:11:12,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3690140.0, ans=0.0 2023-11-28 22:11:14,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3690140.0, ans=0.07 2023-11-28 22:11:33,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3690206.6666666665, ans=0.0 2023-11-28 22:11:42,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3690273.3333333335, ans=0.0 2023-11-28 22:11:47,907 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 553550 2023-11-28 22:11:51,264 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 450, loss[loss=0.05964, simple_loss=0.07404, pruned_loss=0.009509, audio_tagging_loss=0.01311, over 15237.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.08986, pruned_loss=0.01194, audio_tagging_loss=0.009312, over 2730360.06 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:12:02,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3690406.6666666665, ans=0.1 2023-11-28 22:12:12,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3690406.6666666665, ans=0.125 2023-11-28 22:12:18,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3690473.3333333335, ans=0.125 2023-11-28 22:12:20,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3690473.3333333335, ans=0.125 2023-11-28 22:12:49,035 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 553600 2023-11-28 22:12:52,917 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 500, loss[loss=0.04721, simple_loss=0.06347, pruned_loss=0.006876, audio_tagging_loss=0.008605, over 15404.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.0898, pruned_loss=0.01201, audio_tagging_loss=0.009146, over 2800663.40 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:12:56,742 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3690673.3333333335, ans=0.125 2023-11-28 22:13:01,830 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.752e+01 8.926e+01 9.624e+01 1.054e+02 1.218e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-28 22:13:02,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3690673.3333333335, ans=0.125 2023-11-28 22:13:07,193 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3690740.0, ans=0.0 2023-11-28 22:13:08,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3690740.0, ans=0.025 2023-11-28 22:13:21,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3690806.6666666665, ans=0.1 2023-11-28 22:13:22,552 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.49 vs. limit=15.0 2023-11-28 22:13:25,576 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3690806.6666666665, ans=0.2 2023-11-28 22:13:39,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3690873.3333333335, ans=0.125 2023-11-28 22:13:39,291 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3690873.3333333335, ans=0.04949747468305833 2023-11-28 22:13:51,624 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 553650 2023-11-28 22:13:51,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3690940.0, ans=0.125 2023-11-28 22:13:55,650 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 550, loss[loss=0.06463, simple_loss=0.08723, pruned_loss=0.01317, audio_tagging_loss=0.007843, over 15281.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.0897, pruned_loss=0.01203, audio_tagging_loss=0.009111, over 2848168.44 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:14:13,308 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.73 vs. limit=15.0 2023-11-28 22:14:14,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3691073.3333333335, ans=0.0 2023-11-28 22:14:35,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3691206.6666666665, ans=0.1 2023-11-28 22:14:39,411 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3691206.6666666665, ans=0.125 2023-11-28 22:14:42,027 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.07 vs. limit=22.5 2023-11-28 22:14:42,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3691206.6666666665, ans=0.1 2023-11-28 22:14:46,309 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3691273.3333333335, ans=0.2 2023-11-28 22:14:49,816 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3691273.3333333335, ans=0.1 2023-11-28 22:14:53,473 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 553700 2023-11-28 22:14:57,457 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 600, loss[loss=0.07917, simple_loss=0.1161, pruned_loss=0.01567, audio_tagging_loss=0.005432, over 15477.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.08897, pruned_loss=0.01206, audio_tagging_loss=0.009058, over 2890884.53 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:14:59,442 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.95 vs. limit=15.0 2023-11-28 22:15:01,927 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.20 vs. limit=15.0 2023-11-28 22:15:06,291 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.361e+01 8.960e+01 9.634e+01 1.013e+02 1.210e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-28 22:15:11,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3691406.6666666665, ans=0.125 2023-11-28 22:15:31,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3691473.3333333335, ans=0.125 2023-11-28 22:15:55,461 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 553750 2023-11-28 22:15:58,950 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 650, loss[loss=0.05579, simple_loss=0.07261, pruned_loss=0.008309, audio_tagging_loss=0.01118, over 15429.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.0888, pruned_loss=0.01202, audio_tagging_loss=0.008963, over 2918643.94 frames. ], batch size: 60, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:16:05,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3691673.3333333335, ans=0.5 2023-11-28 22:16:41,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3691873.3333333335, ans=0.05 2023-11-28 22:16:56,070 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 553800 2023-11-28 22:17:00,542 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 700, loss[loss=0.05613, simple_loss=0.07324, pruned_loss=0.01041, audio_tagging_loss=0.009105, over 14459.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.08956, pruned_loss=0.01222, audio_tagging_loss=0.008926, over 2945828.03 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:17:05,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3692006.6666666665, ans=0.125 2023-11-28 22:17:09,320 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.371e+01 8.938e+01 9.507e+01 1.029e+02 1.273e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-28 22:17:22,377 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.47 vs. limit=12.0 2023-11-28 22:17:30,858 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.65 vs. limit=15.0 2023-11-28 22:17:58,370 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 553850 2023-11-28 22:18:02,278 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 750, loss[loss=0.0867, simple_loss=0.1238, pruned_loss=0.01603, audio_tagging_loss=0.00876, over 15553.00 frames. ], tot_loss[loss=0.06589, simple_loss=0.08965, pruned_loss=0.01218, audio_tagging_loss=0.008883, over 2972005.57 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:18:02,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3692340.0, ans=0.07 2023-11-28 22:18:33,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3692473.3333333335, ans=0.0 2023-11-28 22:18:35,754 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.29 vs. limit=6.0 2023-11-28 22:19:00,730 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 553900 2023-11-28 22:19:04,213 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 800, loss[loss=0.05846, simple_loss=0.08146, pruned_loss=0.009105, audio_tagging_loss=0.008628, over 15271.00 frames. ], tot_loss[loss=0.06595, simple_loss=0.08985, pruned_loss=0.01217, audio_tagging_loss=0.008851, over 2988065.16 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:19:09,290 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3692673.3333333335, ans=0.04949747468305833 2023-11-28 22:19:12,505 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.517e+01 8.995e+01 9.559e+01 1.026e+02 1.353e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-28 22:19:30,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3692806.6666666665, ans=0.125 2023-11-28 22:19:34,039 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 22:19:56,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3692940.0, ans=0.0 2023-11-28 22:20:02,072 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 553950 2023-11-28 22:20:05,593 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 850, loss[loss=0.08654, simple_loss=0.129, pruned_loss=0.01524, audio_tagging_loss=0.006816, over 16758.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.08937, pruned_loss=0.01209, audio_tagging_loss=0.009003, over 2997755.26 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:20:12,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3693006.6666666665, ans=0.0 2023-11-28 22:20:27,488 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3693073.3333333335, ans=0.07 2023-11-28 22:21:03,773 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 554000 2023-11-28 22:21:05,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3693273.3333333335, ans=0.125 2023-11-28 22:21:07,977 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 900, loss[loss=0.07064, simple_loss=0.09545, pruned_loss=0.01329, audio_tagging_loss=0.009631, over 16151.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.09002, pruned_loss=0.01212, audio_tagging_loss=0.008922, over 3009052.93 frames. ], batch size: 60, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:21:12,624 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.69 vs. limit=15.0 2023-11-28 22:21:16,637 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.576e+01 8.970e+01 9.672e+01 1.016e+02 1.262e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-28 22:21:21,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3693406.6666666665, ans=0.125 2023-11-28 22:21:25,732 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3693406.6666666665, ans=0.125 2023-11-28 22:21:39,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3693473.3333333335, ans=0.125 2023-11-28 22:21:48,019 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.27 vs. limit=15.0 2023-11-28 22:21:54,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3693540.0, ans=0.1 2023-11-28 22:21:55,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3693540.0, ans=0.0 2023-11-28 22:22:04,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3693606.6666666665, ans=0.125 2023-11-28 22:22:06,257 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 554050 2023-11-28 22:22:10,249 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 950, loss[loss=0.06245, simple_loss=0.08918, pruned_loss=0.01175, audio_tagging_loss=0.006113, over 15618.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.08978, pruned_loss=0.01212, audio_tagging_loss=0.008844, over 3016888.59 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:22:17,664 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3693673.3333333335, ans=0.2 2023-11-28 22:22:17,942 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.34 vs. limit=15.0 2023-11-28 22:22:35,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3693806.6666666665, ans=0.1 2023-11-28 22:23:08,027 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 554100 2023-11-28 22:23:11,526 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 1000, loss[loss=0.06449, simple_loss=0.08748, pruned_loss=0.01385, audio_tagging_loss=0.006903, over 15790.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.09004, pruned_loss=0.01222, audio_tagging_loss=0.00866, over 3020359.89 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:23:20,410 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.560e+01 9.063e+01 9.775e+01 1.049e+02 2.458e+02, threshold=1.955e+02, percent-clipped=1.0 2023-11-28 22:23:24,594 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.55 vs. limit=12.0 2023-11-28 22:23:25,261 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3694073.3333333335, ans=0.2 2023-11-28 22:23:39,345 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 22:23:59,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3694273.3333333335, ans=0.0 2023-11-28 22:24:09,807 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 554150 2023-11-28 22:24:13,296 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 1050, loss[loss=0.05499, simple_loss=0.07293, pruned_loss=0.01024, audio_tagging_loss=0.008278, over 15416.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08993, pruned_loss=0.0122, audio_tagging_loss=0.008505, over 3030314.43 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:24:18,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3694340.0, ans=0.0 2023-11-28 22:25:03,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3694606.6666666665, ans=0.125 2023-11-28 22:25:11,924 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 554200 2023-11-28 22:25:15,603 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 1100, loss[loss=0.08277, simple_loss=0.1088, pruned_loss=0.02026, audio_tagging_loss=0.008126, over 15463.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.0898, pruned_loss=0.01224, audio_tagging_loss=0.008586, over 3031700.05 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:25:19,641 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 22:25:24,303 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.729e+01 9.004e+01 9.578e+01 1.033e+02 1.285e+02, threshold=1.916e+02, percent-clipped=0.0 2023-11-28 22:25:39,433 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3694806.6666666665, ans=0.1 2023-11-28 22:25:47,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3694806.6666666665, ans=0.1 2023-11-28 22:25:48,782 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3694806.6666666665, ans=0.125 2023-11-28 22:25:58,534 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3694873.3333333335, ans=0.125 2023-11-28 22:26:03,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3694873.3333333335, ans=0.125 2023-11-28 22:26:03,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3694873.3333333335, ans=0.125 2023-11-28 22:26:13,900 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 554250 2023-11-28 22:26:16,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3695006.6666666665, ans=0.0 2023-11-28 22:26:17,360 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 1150, loss[loss=0.06743, simple_loss=0.1019, pruned_loss=0.01071, audio_tagging_loss=0.005757, over 15087.00 frames. ], tot_loss[loss=0.06507, simple_loss=0.08883, pruned_loss=0.01206, audio_tagging_loss=0.008589, over 3032211.25 frames. ], batch size: 54, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:26:45,090 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=3695140.0, ans=0.5 2023-11-28 22:26:59,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3695206.6666666665, ans=0.125 2023-11-28 22:27:00,173 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3695206.6666666665, ans=0.125 2023-11-28 22:27:15,911 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 554300 2023-11-28 22:27:18,875 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=3695340.0, ans=6.0 2023-11-28 22:27:19,247 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 1200, loss[loss=0.06976, simple_loss=0.09379, pruned_loss=0.01212, audio_tagging_loss=0.01075, over 15707.00 frames. ], tot_loss[loss=0.06496, simple_loss=0.08876, pruned_loss=0.01203, audio_tagging_loss=0.008544, over 3031119.96 frames. ], batch size: 61, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:27:20,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3695340.0, ans=0.125 2023-11-28 22:27:21,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3695340.0, ans=0.0 2023-11-28 22:27:27,986 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.320e+01 8.745e+01 9.451e+01 1.036e+02 1.471e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-28 22:27:44,518 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3695473.3333333335, ans=0.125 2023-11-28 22:27:50,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3695473.3333333335, ans=0.0 2023-11-28 22:27:57,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3695540.0, ans=0.2 2023-11-28 22:28:02,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3695540.0, ans=0.1 2023-11-28 22:28:16,355 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.88 vs. limit=6.0 2023-11-28 22:28:16,957 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 554350 2023-11-28 22:28:20,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3695673.3333333335, ans=0.0 2023-11-28 22:28:20,989 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 1250, loss[loss=0.06639, simple_loss=0.08822, pruned_loss=0.01115, audio_tagging_loss=0.01113, over 13874.00 frames. ], tot_loss[loss=0.06498, simple_loss=0.08879, pruned_loss=0.01198, audio_tagging_loss=0.008606, over 3025312.10 frames. ], batch size: 53, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:28:33,693 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 22:28:48,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3695806.6666666665, ans=0.07 2023-11-28 22:28:55,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3695806.6666666665, ans=0.125 2023-11-28 22:28:56,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3695873.3333333335, ans=0.0 2023-11-28 22:29:13,230 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.12 vs. limit=15.0 2023-11-28 22:29:18,983 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 554400 2023-11-28 22:29:22,791 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 1300, loss[loss=0.0525, simple_loss=0.06317, pruned_loss=0.01023, audio_tagging_loss=0.01069, over 14828.00 frames. ], tot_loss[loss=0.0645, simple_loss=0.08798, pruned_loss=0.01192, audio_tagging_loss=0.008592, over 3028165.23 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:29:28,693 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3696006.6666666665, ans=0.125 2023-11-28 22:29:30,742 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.187e+01 9.038e+01 9.627e+01 1.019e+02 1.676e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-28 22:29:34,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3696073.3333333335, ans=0.0 2023-11-28 22:29:44,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3696073.3333333335, ans=0.0 2023-11-28 22:29:50,834 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.40 vs. limit=8.0 2023-11-28 22:30:05,459 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3696206.6666666665, ans=0.125 2023-11-28 22:30:14,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3696273.3333333335, ans=0.0 2023-11-28 22:30:14,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3696273.3333333335, ans=0.125 2023-11-28 22:30:18,563 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.72 vs. limit=15.0 2023-11-28 22:30:20,521 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.32 vs. limit=10.0 2023-11-28 22:30:21,008 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 554450 2023-11-28 22:30:24,567 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 1350, loss[loss=0.0647, simple_loss=0.0876, pruned_loss=0.01094, audio_tagging_loss=0.009961, over 15625.00 frames. ], tot_loss[loss=0.06523, simple_loss=0.08928, pruned_loss=0.01207, audio_tagging_loss=0.008521, over 3036153.78 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:30:40,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3696406.6666666665, ans=0.125 2023-11-28 22:30:53,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3696473.3333333335, ans=0.125 2023-11-28 22:31:10,313 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 22:31:16,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3696606.6666666665, ans=0.125 2023-11-28 22:31:22,802 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 554500 2023-11-28 22:31:26,149 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 1400, loss[loss=0.07851, simple_loss=0.1044, pruned_loss=0.01863, audio_tagging_loss=0.007699, over 15442.00 frames. ], tot_loss[loss=0.06511, simple_loss=0.0888, pruned_loss=0.01213, audio_tagging_loss=0.008576, over 3037851.66 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:31:35,450 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.504e+01 8.987e+01 9.786e+01 1.046e+02 1.300e+02, threshold=1.957e+02, percent-clipped=0.0 2023-11-28 22:31:36,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3696673.3333333335, ans=0.0 2023-11-28 22:31:40,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3696740.0, ans=0.0 2023-11-28 22:31:41,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3696740.0, ans=0.1 2023-11-28 22:31:43,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3696740.0, ans=0.07 2023-11-28 22:31:44,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3696740.0, ans=0.125 2023-11-28 22:32:15,635 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3696940.0, ans=0.0 2023-11-28 22:32:17,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3696940.0, ans=0.025 2023-11-28 22:32:24,838 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 554550 2023-11-28 22:32:28,208 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 1450, loss[loss=0.08541, simple_loss=0.1157, pruned_loss=0.02022, audio_tagging_loss=0.007325, over 15864.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.08989, pruned_loss=0.01236, audio_tagging_loss=0.008556, over 3037825.76 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 8.0 2023-11-28 22:32:35,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3697006.6666666665, ans=0.1 2023-11-28 22:32:46,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3697073.3333333335, ans=0.125 2023-11-28 22:33:25,736 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 554600 2023-11-28 22:33:29,720 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 1500, loss[loss=0.06203, simple_loss=0.08802, pruned_loss=0.01124, audio_tagging_loss=0.006786, over 13768.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.09039, pruned_loss=0.01238, audio_tagging_loss=0.008572, over 3034071.18 frames. ], batch size: 53, lr: 1.44e-03, grad_scale: 8.0 2023-11-28 22:33:40,218 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.171e+01 8.916e+01 9.599e+01 1.025e+02 1.569e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-28 22:33:44,610 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3697406.6666666665, ans=0.0 2023-11-28 22:33:47,279 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.28 vs. limit=15.0 2023-11-28 22:33:56,890 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3697473.3333333335, ans=0.0 2023-11-28 22:34:27,884 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 554650 2023-11-28 22:34:31,288 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 1550, loss[loss=0.0539, simple_loss=0.07124, pruned_loss=0.009575, audio_tagging_loss=0.008706, over 15231.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08928, pruned_loss=0.0122, audio_tagging_loss=0.008682, over 3032637.50 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 8.0 2023-11-28 22:34:31,792 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.96 vs. limit=12.0 2023-11-28 22:34:34,341 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.66 vs. limit=10.0 2023-11-28 22:34:45,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3697740.0, ans=0.125 2023-11-28 22:34:47,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3697740.0, ans=0.125 2023-11-28 22:35:19,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3697940.0, ans=0.0 2023-11-28 22:35:21,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3697940.0, ans=0.125 2023-11-28 22:35:29,275 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 554700 2023-11-28 22:35:31,149 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.09 vs. limit=15.0 2023-11-28 22:35:32,755 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 1600, loss[loss=0.07059, simple_loss=0.08848, pruned_loss=0.01585, audio_tagging_loss=0.0105, over 14586.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08913, pruned_loss=0.0122, audio_tagging_loss=0.008739, over 3033085.58 frames. ], batch size: 54, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:35:34,343 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3698006.6666666665, ans=0.0 2023-11-28 22:35:44,023 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.599e+01 8.984e+01 9.580e+01 1.035e+02 1.494e+02, threshold=1.916e+02, percent-clipped=0.0 2023-11-28 22:35:49,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3698073.3333333335, ans=0.5 2023-11-28 22:35:59,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3698140.0, ans=0.0 2023-11-28 22:35:59,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3698140.0, ans=0.125 2023-11-28 22:36:05,847 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 22:36:09,809 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3698206.6666666665, ans=0.0 2023-11-28 22:36:10,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3698206.6666666665, ans=0.5 2023-11-28 22:36:31,683 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 554750 2023-11-28 22:36:33,086 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3698273.3333333335, ans=0.125 2023-11-28 22:36:35,095 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 1650, loss[loss=0.07132, simple_loss=0.0997, pruned_loss=0.0133, audio_tagging_loss=0.008177, over 15366.00 frames. ], tot_loss[loss=0.06509, simple_loss=0.08858, pruned_loss=0.01203, audio_tagging_loss=0.008775, over 3039941.69 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:36:41,719 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3698340.0, ans=0.125 2023-11-28 22:36:45,700 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.23 vs. limit=15.0 2023-11-28 22:36:54,233 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.18 vs. limit=15.0 2023-11-28 22:36:55,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3698406.6666666665, ans=0.1 2023-11-28 22:37:07,361 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3698473.3333333335, ans=0.125 2023-11-28 22:37:07,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3698473.3333333335, ans=0.0 2023-11-28 22:37:33,139 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 554800 2023-11-28 22:37:37,079 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 1700, loss[loss=0.06792, simple_loss=0.1045, pruned_loss=0.009485, audio_tagging_loss=0.006189, over 15936.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.08957, pruned_loss=0.01221, audio_tagging_loss=0.008738, over 3041357.42 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:37:45,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3698673.3333333335, ans=0.1 2023-11-28 22:37:47,559 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.735e+01 8.852e+01 9.479e+01 1.004e+02 1.252e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-28 22:38:02,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3698806.6666666665, ans=0.0 2023-11-28 22:38:02,682 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.76 vs. limit=22.5 2023-11-28 22:38:26,135 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.02 vs. limit=15.0 2023-11-28 22:38:34,909 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 554850 2023-11-28 22:38:38,829 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 1750, loss[loss=0.08446, simple_loss=0.1185, pruned_loss=0.01974, audio_tagging_loss=0.005459, over 13958.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08916, pruned_loss=0.01205, audio_tagging_loss=0.008708, over 3030918.33 frames. ], batch size: 53, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:38:43,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3699006.6666666665, ans=0.125 2023-11-28 22:38:58,752 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3699073.3333333335, ans=0.0 2023-11-28 22:39:15,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3699206.6666666665, ans=0.0 2023-11-28 22:39:35,382 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 22:39:36,335 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 554900 2023-11-28 22:39:40,447 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 1800, loss[loss=0.06328, simple_loss=0.08941, pruned_loss=0.009617, audio_tagging_loss=0.00896, over 14868.00 frames. ], tot_loss[loss=0.06588, simple_loss=0.08994, pruned_loss=0.01223, audio_tagging_loss=0.00867, over 3036530.95 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:39:51,491 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.305e+01 9.095e+01 9.843e+01 1.068e+02 2.957e+02, threshold=1.969e+02, percent-clipped=2.0 2023-11-28 22:39:59,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3699406.6666666665, ans=0.125 2023-11-28 22:40:17,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3699540.0, ans=0.0 2023-11-28 22:40:38,780 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 554950 2023-11-28 22:40:42,255 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 1850, loss[loss=0.07767, simple_loss=0.09974, pruned_loss=0.01785, audio_tagging_loss=0.009952, over 15185.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.0896, pruned_loss=0.01217, audio_tagging_loss=0.008675, over 3035397.88 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:40:43,141 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.77 vs. limit=15.0 2023-11-28 22:40:51,555 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3699673.3333333335, ans=0.125 2023-11-28 22:41:09,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3699806.6666666665, ans=0.125 2023-11-28 22:41:27,107 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 22:41:33,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3699940.0, ans=0.125 2023-11-28 22:41:40,464 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 555000 2023-11-28 22:41:44,247 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 1900, loss[loss=0.04956, simple_loss=0.06562, pruned_loss=0.007261, audio_tagging_loss=0.009488, over 16272.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.08998, pruned_loss=0.01215, audio_tagging_loss=0.008573, over 3039165.71 frames. ], batch size: 62, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:41:48,600 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3700006.6666666665, ans=0.1 2023-11-28 22:41:55,307 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.187e+01 8.917e+01 9.676e+01 1.038e+02 1.630e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-28 22:42:41,993 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 555050 2023-11-28 22:42:45,399 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 1950, loss[loss=0.06231, simple_loss=0.08522, pruned_loss=0.01123, audio_tagging_loss=0.008467, over 15118.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.08993, pruned_loss=0.0124, audio_tagging_loss=0.008486, over 3042672.93 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:42:56,235 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3700340.0, ans=0.09899494936611666 2023-11-28 22:43:08,351 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3700406.6666666665, ans=0.2 2023-11-28 22:43:20,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3700473.3333333335, ans=0.125 2023-11-28 22:43:21,832 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3700540.0, ans=0.1 2023-11-28 22:43:43,594 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 555100 2023-11-28 22:43:46,981 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 2000, loss[loss=0.05994, simple_loss=0.08304, pruned_loss=0.007527, audio_tagging_loss=0.01089, over 15424.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.09004, pruned_loss=0.01229, audio_tagging_loss=0.008475, over 3044381.96 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:43:49,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3700673.3333333335, ans=0.0 2023-11-28 22:43:58,089 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.153e+01 8.916e+01 9.601e+01 1.024e+02 1.438e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-28 22:43:59,569 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3700740.0, ans=0.2 2023-11-28 22:44:18,873 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 22:44:45,216 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 555150 2023-11-28 22:44:48,558 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 2050, loss[loss=0.05235, simple_loss=0.07377, pruned_loss=0.007079, audio_tagging_loss=0.008394, over 15473.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.08905, pruned_loss=0.01215, audio_tagging_loss=0.008498, over 3038447.66 frames. ], batch size: 61, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:44:48,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3701006.6666666665, ans=0.125 2023-11-28 22:45:02,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3701073.3333333335, ans=0.125 2023-11-28 22:45:12,495 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.98 vs. limit=22.5 2023-11-28 22:45:18,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3701140.0, ans=0.0 2023-11-28 22:45:25,373 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.44 vs. limit=22.5 2023-11-28 22:45:34,494 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 22:45:39,850 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.27 vs. limit=12.0 2023-11-28 22:45:46,378 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 555200 2023-11-28 22:45:50,162 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 2100, loss[loss=0.06862, simple_loss=0.09871, pruned_loss=0.01273, audio_tagging_loss=0.006531, over 16474.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.08976, pruned_loss=0.01225, audio_tagging_loss=0.008427, over 3038614.84 frames. ], batch size: 61, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:46:02,559 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.190e+01 8.958e+01 9.568e+01 1.025e+02 1.229e+02, threshold=1.914e+02, percent-clipped=0.0 2023-11-28 22:46:04,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3701406.6666666665, ans=0.125 2023-11-28 22:46:12,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3701406.6666666665, ans=0.0 2023-11-28 22:46:30,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3701540.0, ans=0.125 2023-11-28 22:46:32,788 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.45 vs. limit=15.0 2023-11-28 22:46:47,036 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3701606.6666666665, ans=0.125 2023-11-28 22:46:48,006 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 555250 2023-11-28 22:46:52,246 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 2150, loss[loss=0.05552, simple_loss=0.07341, pruned_loss=0.01205, audio_tagging_loss=0.006757, over 14582.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.09095, pruned_loss=0.01246, audio_tagging_loss=0.008321, over 3042424.65 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:46:55,251 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.00 vs. limit=15.0 2023-11-28 22:46:56,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3701673.3333333335, ans=0.125 2023-11-28 22:47:01,235 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3701673.3333333335, ans=0.125 2023-11-28 22:47:30,104 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 22:47:50,731 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 555300 2023-11-28 22:47:54,164 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 2200, loss[loss=0.06865, simple_loss=0.09779, pruned_loss=0.01026, audio_tagging_loss=0.009493, over 16294.00 frames. ], tot_loss[loss=0.06624, simple_loss=0.09081, pruned_loss=0.01247, audio_tagging_loss=0.008362, over 3044276.69 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:48:06,444 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.601e+01 9.046e+01 9.585e+01 1.059e+02 1.446e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-28 22:48:09,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3702073.3333333335, ans=0.2 2023-11-28 22:48:20,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3702140.0, ans=0.125 2023-11-28 22:48:29,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3702206.6666666665, ans=0.0 2023-11-28 22:48:32,581 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3702206.6666666665, ans=0.125 2023-11-28 22:48:45,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3702273.3333333335, ans=0.0 2023-11-28 22:48:46,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3702273.3333333335, ans=0.2 2023-11-28 22:48:52,182 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 555350 2023-11-28 22:48:53,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3702273.3333333335, ans=0.1 2023-11-28 22:48:55,609 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 2250, loss[loss=0.08344, simple_loss=0.1238, pruned_loss=0.01382, audio_tagging_loss=0.007738, over 16404.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.09073, pruned_loss=0.01239, audio_tagging_loss=0.008408, over 3043821.55 frames. ], batch size: 60, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:48:59,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3702340.0, ans=0.0 2023-11-28 22:49:04,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3702340.0, ans=0.0 2023-11-28 22:49:19,259 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3702473.3333333335, ans=0.0 2023-11-28 22:49:36,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3702540.0, ans=0.1 2023-11-28 22:49:52,835 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 555400 2023-11-28 22:49:56,832 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 2300, loss[loss=0.0635, simple_loss=0.09224, pruned_loss=0.01269, audio_tagging_loss=0.004692, over 15300.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.09055, pruned_loss=0.01246, audio_tagging_loss=0.008375, over 3043303.99 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:50:09,269 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.723e+01 8.895e+01 9.474e+01 1.045e+02 1.271e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-28 22:50:09,663 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 22:50:15,958 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.00 vs. limit=12.0 2023-11-28 22:50:24,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3702806.6666666665, ans=0.0 2023-11-28 22:50:31,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3702806.6666666665, ans=0.1 2023-11-28 22:50:35,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3702873.3333333335, ans=0.0 2023-11-28 22:50:40,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3702873.3333333335, ans=0.0 2023-11-28 22:50:43,628 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3702873.3333333335, ans=0.125 2023-11-28 22:50:51,558 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 22:50:53,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3702940.0, ans=0.125 2023-11-28 22:50:55,222 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 555450 2023-11-28 22:50:58,634 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 2350, loss[loss=0.06414, simple_loss=0.09207, pruned_loss=0.01012, audio_tagging_loss=0.007988, over 15354.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08986, pruned_loss=0.01218, audio_tagging_loss=0.008514, over 3047592.25 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:51:08,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3703006.6666666665, ans=0.0 2023-11-28 22:51:29,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3703140.0, ans=0.1 2023-11-28 22:51:50,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3703273.3333333335, ans=0.0 2023-11-28 22:51:55,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3703273.3333333335, ans=0.1 2023-11-28 22:51:56,456 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 555500 2023-11-28 22:51:59,838 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 2400, loss[loss=0.06122, simple_loss=0.08122, pruned_loss=0.01004, audio_tagging_loss=0.01057, over 15137.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.08948, pruned_loss=0.01203, audio_tagging_loss=0.008648, over 3049602.33 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:52:02,680 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.61 vs. limit=12.0 2023-11-28 22:52:03,963 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.73 vs. limit=15.0 2023-11-28 22:52:11,684 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.080e+01 8.887e+01 9.633e+01 1.018e+02 1.587e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-28 22:52:28,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3703473.3333333335, ans=0.125 2023-11-28 22:52:35,182 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.00 vs. limit=15.0 2023-11-28 22:52:44,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3703540.0, ans=0.125 2023-11-28 22:52:46,107 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3703540.0, ans=0.2 2023-11-28 22:52:48,550 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.07 vs. limit=15.0 2023-11-28 22:52:51,037 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.78 vs. limit=15.0 2023-11-28 22:52:55,615 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.96 vs. limit=12.0 2023-11-28 22:52:57,468 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 555550 2023-11-28 22:53:01,559 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 2450, loss[loss=0.06857, simple_loss=0.1067, pruned_loss=0.008622, audio_tagging_loss=0.006609, over 14988.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.08987, pruned_loss=0.01208, audio_tagging_loss=0.008669, over 3050726.61 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:53:05,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3703673.3333333335, ans=0.125 2023-11-28 22:53:15,816 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.35 vs. limit=6.0 2023-11-28 22:53:25,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3703806.6666666665, ans=0.1 2023-11-28 22:53:28,291 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3703806.6666666665, ans=0.0 2023-11-28 22:53:29,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3703806.6666666665, ans=0.0 2023-11-28 22:53:34,546 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=7.69 vs. limit=12.0 2023-11-28 22:53:59,883 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 555600 2023-11-28 22:53:59,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3703940.0, ans=0.125 2023-11-28 22:54:04,196 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 2500, loss[loss=0.05197, simple_loss=0.06902, pruned_loss=0.008638, audio_tagging_loss=0.008821, over 14555.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.08897, pruned_loss=0.01187, audio_tagging_loss=0.008762, over 3051869.57 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:54:17,251 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.711e+01 9.036e+01 9.436e+01 1.021e+02 1.491e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-28 22:54:28,423 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.02 vs. limit=15.0 2023-11-28 22:54:31,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3704140.0, ans=0.125 2023-11-28 22:54:33,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3704140.0, ans=0.125 2023-11-28 22:54:39,554 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.09 vs. limit=10.0 2023-11-28 22:54:44,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3704206.6666666665, ans=0.125 2023-11-28 22:55:03,021 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 555650 2023-11-28 22:55:06,525 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 2550, loss[loss=0.0766, simple_loss=0.1098, pruned_loss=0.01082, audio_tagging_loss=0.01086, over 15026.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.0896, pruned_loss=0.01208, audio_tagging_loss=0.008633, over 3051323.06 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:55:10,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3704340.0, ans=0.1 2023-11-28 22:55:12,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3704340.0, ans=0.0 2023-11-28 22:55:12,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3704340.0, ans=0.1 2023-11-28 22:55:13,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3704340.0, ans=0.0 2023-11-28 22:55:21,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3704406.6666666665, ans=10.0 2023-11-28 22:55:28,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3704406.6666666665, ans=0.0 2023-11-28 22:55:44,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3704540.0, ans=0.125 2023-11-28 22:56:04,092 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 555700 2023-11-28 22:56:07,451 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 2600, loss[loss=0.03371, simple_loss=0.03644, pruned_loss=0.003227, audio_tagging_loss=0.01226, over 14870.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.0898, pruned_loss=0.01204, audio_tagging_loss=0.008537, over 3046658.67 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:56:20,789 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.191e+01 8.832e+01 9.497e+01 1.024e+02 1.176e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-28 22:56:48,940 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.36 vs. limit=15.0 2023-11-28 22:56:49,014 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.11 vs. limit=6.0 2023-11-28 22:57:02,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3704940.0, ans=0.125 2023-11-28 22:57:05,600 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 555750 2023-11-28 22:57:09,018 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 2650, loss[loss=0.04623, simple_loss=0.06505, pruned_loss=0.004323, audio_tagging_loss=0.009381, over 14645.00 frames. ], tot_loss[loss=0.06522, simple_loss=0.08945, pruned_loss=0.01204, audio_tagging_loss=0.008446, over 3048307.12 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:57:17,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3705006.6666666665, ans=0.0 2023-11-28 22:57:20,765 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.24 vs. limit=10.0 2023-11-28 22:57:58,894 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3705273.3333333335, ans=0.2 2023-11-28 22:58:07,126 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 555800 2023-11-28 22:58:10,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3705340.0, ans=0.125 2023-11-28 22:58:11,030 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 2700, loss[loss=0.05345, simple_loss=0.07134, pruned_loss=0.007973, audio_tagging_loss=0.009812, over 16466.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.0895, pruned_loss=0.012, audio_tagging_loss=0.008424, over 3056944.14 frames. ], batch size: 62, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:58:24,443 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.962e+01 9.013e+01 9.562e+01 1.012e+02 1.188e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-28 22:58:29,613 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.37 vs. limit=15.0 2023-11-28 22:58:51,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3705540.0, ans=0.0 2023-11-28 22:58:55,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3705540.0, ans=0.125 2023-11-28 22:59:05,288 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.38 vs. limit=22.5 2023-11-28 22:59:09,258 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 555850 2023-11-28 22:59:12,656 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 2750, loss[loss=0.06345, simple_loss=0.07938, pruned_loss=0.01352, audio_tagging_loss=0.01024, over 15971.00 frames. ], tot_loss[loss=0.06466, simple_loss=0.08874, pruned_loss=0.01182, audio_tagging_loss=0.008472, over 3050129.97 frames. ], batch size: 61, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:59:15,261 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 22:59:20,433 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3705673.3333333335, ans=0.125 2023-11-28 22:59:22,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.whiten.whitening_limit, batch_count=3705673.3333333335, ans=15.0 2023-11-28 22:59:29,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3705740.0, ans=0.125 2023-11-28 23:00:06,875 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3705940.0, ans=0.2 2023-11-28 23:00:07,614 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 23:00:10,040 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 555900 2023-11-28 23:00:12,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3706006.6666666665, ans=0.1 2023-11-28 23:00:13,461 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 2800, loss[loss=0.05023, simple_loss=0.06384, pruned_loss=0.007515, audio_tagging_loss=0.0108, over 15093.00 frames. ], tot_loss[loss=0.06441, simple_loss=0.08834, pruned_loss=0.0117, audio_tagging_loss=0.008534, over 3056249.56 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 23:00:27,364 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.511e+01 8.973e+01 9.470e+01 1.013e+02 1.282e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-28 23:00:31,847 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3706073.3333333335, ans=0.0 2023-11-28 23:00:44,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3706140.0, ans=0.2 2023-11-28 23:01:12,342 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 555950 2023-11-28 23:01:16,233 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 2850, loss[loss=0.05007, simple_loss=0.07062, pruned_loss=0.005861, audio_tagging_loss=0.008902, over 14748.00 frames. ], tot_loss[loss=0.0647, simple_loss=0.08874, pruned_loss=0.01186, audio_tagging_loss=0.008471, over 3052210.37 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 23:01:16,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3706340.0, ans=0.125 2023-11-28 23:01:18,107 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.09 vs. limit=15.0 2023-11-28 23:01:59,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3706540.0, ans=0.125 2023-11-28 23:02:11,948 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3706606.6666666665, ans=0.015 2023-11-28 23:02:14,291 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 556000 2023-11-28 23:02:21,337 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 2900, loss[loss=0.05082, simple_loss=0.06338, pruned_loss=0.008795, audio_tagging_loss=0.01033, over 15030.00 frames. ], tot_loss[loss=0.06459, simple_loss=0.08833, pruned_loss=0.01184, audio_tagging_loss=0.008586, over 3052105.05 frames. ], batch size: 60, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 23:02:27,915 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.85 vs. limit=15.0 2023-11-28 23:02:33,815 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3706740.0, ans=0.0 2023-11-28 23:02:34,744 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.253e+01 8.955e+01 9.573e+01 1.059e+02 1.416e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-28 23:02:44,749 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3706806.6666666665, ans=0.0 2023-11-28 23:03:01,899 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 23:03:14,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3706940.0, ans=0.0 2023-11-28 23:03:19,523 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 556050 2023-11-28 23:03:20,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3706940.0, ans=0.125 2023-11-28 23:03:22,910 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 2950, loss[loss=0.06895, simple_loss=0.09631, pruned_loss=0.01251, audio_tagging_loss=0.008282, over 15172.00 frames. ], tot_loss[loss=0.06509, simple_loss=0.08883, pruned_loss=0.01204, audio_tagging_loss=0.008632, over 3053638.09 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 23:03:27,151 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.86 vs. limit=15.0 2023-11-28 23:03:29,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3707006.6666666665, ans=0.125 2023-11-28 23:03:58,212 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3707140.0, ans=0.125 2023-11-28 23:04:02,038 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.93 vs. limit=15.0 2023-11-28 23:04:04,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3707206.6666666665, ans=0.125 2023-11-28 23:04:06,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3707206.6666666665, ans=0.125 2023-11-28 23:04:21,529 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 556100 2023-11-28 23:04:22,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3707273.3333333335, ans=0.04949747468305833 2023-11-28 23:04:22,905 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3707273.3333333335, ans=0.125 2023-11-28 23:04:24,921 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 3000, loss[loss=0.08047, simple_loss=0.1136, pruned_loss=0.01297, audio_tagging_loss=0.01071, over 17083.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.08989, pruned_loss=0.01201, audio_tagging_loss=0.008693, over 3055499.49 frames. ], batch size: 64, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:04:24,922 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-28 23:04:44,065 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.6235, 5.1405, 5.5170, 4.8612], device='cuda:1') 2023-11-28 23:04:57,547 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.4575, 2.9862, 3.3050, 2.9761, 3.6452, 3.7427, 3.2380, 3.1992], device='cuda:1') 2023-11-28 23:05:02,670 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.8201, 5.9583, 5.9319, 5.9985], device='cuda:1') 2023-11-28 23:05:04,345 INFO [train_asr.py:1267] (1/4) Epoch 47, validation: loss=0.05749, simple_loss=0.05049, pruned_loss=0.005328, audio_tagging_loss=0.02692, over 4681554.00 frames. 2023-11-28 23:05:04,346 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-28 23:05:20,041 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.840e+01 9.232e+01 9.628e+01 1.042e+02 1.260e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-28 23:05:20,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3707406.6666666665, ans=0.05 2023-11-28 23:05:24,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3707406.6666666665, ans=0.125 2023-11-28 23:05:26,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3707406.6666666665, ans=0.125 2023-11-28 23:05:35,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3707473.3333333335, ans=0.125 2023-11-28 23:05:39,607 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.84 vs. limit=15.0 2023-11-28 23:06:02,577 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 556150 2023-11-28 23:06:05,942 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 3050, loss[loss=0.07502, simple_loss=0.1093, pruned_loss=0.0146, audio_tagging_loss=0.005778, over 15306.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.09085, pruned_loss=0.01201, audio_tagging_loss=0.008608, over 3055189.17 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 8.0 2023-11-28 23:06:25,085 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3707740.0, ans=0.125 2023-11-28 23:06:44,926 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 23:06:46,437 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3707873.3333333335, ans=0.1 2023-11-28 23:07:04,304 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 556200 2023-11-28 23:07:08,263 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 3100, loss[loss=0.08407, simple_loss=0.1188, pruned_loss=0.01524, audio_tagging_loss=0.009417, over 17134.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.09094, pruned_loss=0.01204, audio_tagging_loss=0.00859, over 3050097.16 frames. ], batch size: 60, lr: 1.44e-03, grad_scale: 8.0 2023-11-28 23:07:16,754 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3708006.6666666665, ans=0.125 2023-11-28 23:07:16,800 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3708006.6666666665, ans=0.125 2023-11-28 23:07:21,506 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.06 vs. limit=22.5 2023-11-28 23:07:23,348 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.723e+01 9.064e+01 9.672e+01 1.048e+02 1.274e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-28 23:07:55,433 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.74 vs. limit=15.0 2023-11-28 23:07:56,282 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3708273.3333333335, ans=0.125 2023-11-28 23:08:05,111 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 556250 2023-11-28 23:08:08,486 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 3150, loss[loss=0.09695, simple_loss=0.1212, pruned_loss=0.02774, audio_tagging_loss=0.008585, over 14689.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.09109, pruned_loss=0.01215, audio_tagging_loss=0.008653, over 3049681.24 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 8.0 2023-11-28 23:08:31,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3708406.6666666665, ans=0.125 2023-11-28 23:08:36,856 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.11 vs. limit=15.0 2023-11-28 23:08:40,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3708473.3333333335, ans=0.0 2023-11-28 23:08:45,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3708540.0, ans=0.125 2023-11-28 23:08:47,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3708540.0, ans=0.125 2023-11-28 23:08:48,301 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.73 vs. limit=6.0 2023-11-28 23:09:07,570 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 556300 2023-11-28 23:09:10,942 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 3200, loss[loss=0.06815, simple_loss=0.1063, pruned_loss=0.009755, audio_tagging_loss=0.00525, over 15362.00 frames. ], tot_loss[loss=0.06645, simple_loss=0.09129, pruned_loss=0.01212, audio_tagging_loss=0.008681, over 3053114.75 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:09:21,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3708673.3333333335, ans=0.125 2023-11-28 23:09:26,547 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.302e+01 8.731e+01 9.590e+01 1.027e+02 1.409e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-28 23:09:34,458 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3708806.6666666665, ans=0.1 2023-11-28 23:09:40,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3708806.6666666665, ans=0.2 2023-11-28 23:09:50,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3708873.3333333335, ans=0.125 2023-11-28 23:10:03,890 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3708940.0, ans=0.125 2023-11-28 23:10:09,096 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 556350 2023-11-28 23:10:12,555 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 3250, loss[loss=0.06977, simple_loss=0.09665, pruned_loss=0.01327, audio_tagging_loss=0.008174, over 14779.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.09099, pruned_loss=0.01215, audio_tagging_loss=0.008779, over 3044495.15 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:10:19,124 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3709006.6666666665, ans=0.125 2023-11-28 23:10:30,533 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3709073.3333333335, ans=0.125 2023-11-28 23:10:31,182 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.72 vs. limit=15.0 2023-11-28 23:10:40,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3709140.0, ans=0.125 2023-11-28 23:10:44,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3709140.0, ans=0.125 2023-11-28 23:10:47,212 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3709140.0, ans=0.125 2023-11-28 23:10:48,387 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3709206.6666666665, ans=0.1 2023-11-28 23:11:06,289 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.62 vs. limit=15.0 2023-11-28 23:11:10,658 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 556400 2023-11-28 23:11:14,533 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 3300, loss[loss=0.07541, simple_loss=0.09284, pruned_loss=0.0208, audio_tagging_loss=0.008194, over 13824.00 frames. ], tot_loss[loss=0.0668, simple_loss=0.09141, pruned_loss=0.0123, audio_tagging_loss=0.00879, over 3049600.37 frames. ], batch size: 53, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:11:15,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3709340.0, ans=0.0 2023-11-28 23:11:19,453 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3709340.0, ans=0.1 2023-11-28 23:11:21,872 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3709340.0, ans=0.125 2023-11-28 23:11:31,277 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.994e+01 9.101e+01 9.601e+01 1.014e+02 1.380e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-28 23:11:48,547 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.99 vs. limit=10.0 2023-11-28 23:12:05,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3709606.6666666665, ans=0.0 2023-11-28 23:12:05,401 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3709606.6666666665, ans=0.0 2023-11-28 23:12:08,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3709606.6666666665, ans=0.0 2023-11-28 23:12:12,442 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 556450 2023-11-28 23:12:16,473 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 3350, loss[loss=0.07832, simple_loss=0.09879, pruned_loss=0.0221, audio_tagging_loss=0.006822, over 14908.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.09106, pruned_loss=0.01221, audio_tagging_loss=0.008787, over 3052191.15 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:12:27,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3709740.0, ans=0.125 2023-11-28 23:12:36,523 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3709740.0, ans=0.2 2023-11-28 23:12:40,361 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.14 vs. limit=15.0 2023-11-28 23:12:59,797 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.08 vs. limit=15.0 2023-11-28 23:13:01,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3709873.3333333335, ans=0.0 2023-11-28 23:13:14,738 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 556500 2023-11-28 23:13:18,075 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 3400, loss[loss=0.05716, simple_loss=0.08264, pruned_loss=0.01037, audio_tagging_loss=0.005467, over 15472.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08927, pruned_loss=0.012, audio_tagging_loss=0.008756, over 3052865.39 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:13:23,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3710006.6666666665, ans=0.025 2023-11-28 23:13:29,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3710073.3333333335, ans=0.0 2023-11-28 23:13:33,975 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.643e+01 8.795e+01 9.500e+01 1.053e+02 1.456e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-28 23:13:51,448 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.64 vs. limit=22.5 2023-11-28 23:14:16,452 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 556550 2023-11-28 23:14:16,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3710273.3333333335, ans=0.0 2023-11-28 23:14:19,894 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 3450, loss[loss=0.05559, simple_loss=0.07667, pruned_loss=0.007986, audio_tagging_loss=0.009263, over 13881.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08981, pruned_loss=0.0121, audio_tagging_loss=0.008577, over 3049659.09 frames. ], batch size: 54, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:14:22,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3710340.0, ans=0.125 2023-11-28 23:14:27,693 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.33 vs. limit=15.0 2023-11-28 23:14:38,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3710406.6666666665, ans=0.1 2023-11-28 23:14:41,240 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.27 vs. limit=15.0 2023-11-28 23:14:57,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3710540.0, ans=0.0 2023-11-28 23:15:08,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3710606.6666666665, ans=0.0 2023-11-28 23:15:16,614 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3710606.6666666665, ans=0.125 2023-11-28 23:15:17,593 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 556600 2023-11-28 23:15:21,991 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 3500, loss[loss=0.07444, simple_loss=0.09616, pruned_loss=0.01696, audio_tagging_loss=0.009394, over 14597.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.08991, pruned_loss=0.01218, audio_tagging_loss=0.008575, over 3047699.97 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:15:38,393 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.452e+01 9.007e+01 9.535e+01 1.020e+02 1.277e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-28 23:15:38,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3710740.0, ans=0.125 2023-11-28 23:15:56,633 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 23:16:08,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3710873.3333333335, ans=0.0 2023-11-28 23:16:15,465 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.18 vs. limit=12.0 2023-11-28 23:16:20,435 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 556650 2023-11-28 23:16:24,405 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 3550, loss[loss=0.06194, simple_loss=0.08492, pruned_loss=0.009282, audio_tagging_loss=0.0102, over 15288.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.08996, pruned_loss=0.01196, audio_tagging_loss=0.008574, over 3043236.48 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:16:45,694 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3711073.3333333335, ans=0.5 2023-11-28 23:16:48,234 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.48 vs. limit=15.0 2023-11-28 23:16:51,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3711140.0, ans=0.125 2023-11-28 23:17:00,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3711206.6666666665, ans=0.1 2023-11-28 23:17:23,285 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 556700 2023-11-28 23:17:24,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3711273.3333333335, ans=0.125 2023-11-28 23:17:26,728 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 3600, loss[loss=0.06435, simple_loss=0.09026, pruned_loss=0.01165, audio_tagging_loss=0.007567, over 14709.00 frames. ], tot_loss[loss=0.06493, simple_loss=0.08916, pruned_loss=0.01184, audio_tagging_loss=0.008511, over 3040559.58 frames. ], batch size: 54, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 23:17:42,437 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.234e+01 8.750e+01 9.399e+01 1.010e+02 1.318e+02, threshold=1.880e+02, percent-clipped=0.0 2023-11-28 23:18:00,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3711473.3333333335, ans=0.125 2023-11-28 23:18:20,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3711606.6666666665, ans=0.1 2023-11-28 23:18:23,418 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 556750 2023-11-28 23:18:27,668 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 3650, loss[loss=0.04099, simple_loss=0.05647, pruned_loss=0.005044, audio_tagging_loss=0.007713, over 15040.00 frames. ], tot_loss[loss=0.06462, simple_loss=0.08871, pruned_loss=0.0118, audio_tagging_loss=0.008459, over 3042303.20 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:18:30,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3711673.3333333335, ans=0.125 2023-11-28 23:18:30,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3711673.3333333335, ans=0.125 2023-11-28 23:18:38,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3711740.0, ans=0.05 2023-11-28 23:18:39,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3711740.0, ans=0.125 2023-11-28 23:18:39,901 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3711740.0, ans=0.125 2023-11-28 23:18:44,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3711740.0, ans=0.0 2023-11-28 23:18:49,252 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 23:18:55,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3711806.6666666665, ans=0.0 2023-11-28 23:18:58,258 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.95 vs. limit=10.0 2023-11-28 23:19:01,488 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3711806.6666666665, ans=0.125 2023-11-28 23:19:25,352 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 556800 2023-11-28 23:19:25,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3711940.0, ans=0.0 2023-11-28 23:19:29,807 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 3700, loss[loss=0.06996, simple_loss=0.09921, pruned_loss=0.01026, audio_tagging_loss=0.01009, over 16227.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.08964, pruned_loss=0.012, audio_tagging_loss=0.008441, over 3048213.10 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:19:47,497 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.675e+01 8.931e+01 9.622e+01 1.040e+02 1.365e+02, threshold=1.924e+02, percent-clipped=0.0 2023-11-28 23:19:55,777 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3712140.0, ans=0.125 2023-11-28 23:20:04,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3712140.0, ans=0.1 2023-11-28 23:20:10,562 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3712206.6666666665, ans=0.0 2023-11-28 23:20:11,583 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3712206.6666666665, ans=0.0 2023-11-28 23:20:17,431 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.71 vs. limit=15.0 2023-11-28 23:20:25,029 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3712273.3333333335, ans=0.125 2023-11-28 23:20:28,781 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 556850 2023-11-28 23:20:32,142 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 3750, loss[loss=0.06533, simple_loss=0.09612, pruned_loss=0.01079, audio_tagging_loss=0.006478, over 14844.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08963, pruned_loss=0.01209, audio_tagging_loss=0.008461, over 3048280.09 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 8.0 2023-11-28 23:20:32,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3712340.0, ans=0.125 2023-11-28 23:20:51,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3712406.6666666665, ans=0.0 2023-11-28 23:21:00,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3712473.3333333335, ans=0.125 2023-11-28 23:21:13,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3712540.0, ans=0.0 2023-11-28 23:21:13,490 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.15 vs. limit=15.0 2023-11-28 23:21:16,952 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 23:21:19,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3712540.0, ans=0.125 2023-11-28 23:21:25,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3712606.6666666665, ans=0.04949747468305833 2023-11-28 23:21:30,103 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 556900 2023-11-28 23:21:33,593 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 3800, loss[loss=0.06041, simple_loss=0.07945, pruned_loss=0.01245, audio_tagging_loss=0.008238, over 15650.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08982, pruned_loss=0.01228, audio_tagging_loss=0.008447, over 3046851.63 frames. ], batch size: 62, lr: 1.44e-03, grad_scale: 8.0 2023-11-28 23:21:39,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3712673.3333333335, ans=0.0 2023-11-28 23:21:51,761 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.09 vs. limit=6.0 2023-11-28 23:21:52,346 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.666e+01 9.439e+01 1.001e+02 1.076e+02 2.686e+02, threshold=2.002e+02, percent-clipped=1.0 2023-11-28 23:22:25,369 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.55 vs. limit=15.0 2023-11-28 23:22:31,963 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 556950 2023-11-28 23:22:35,466 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 3850, loss[loss=0.04988, simple_loss=0.06798, pruned_loss=0.005843, audio_tagging_loss=0.01005, over 14537.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.08978, pruned_loss=0.0122, audio_tagging_loss=0.008568, over 3045194.04 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 8.0 2023-11-28 23:22:59,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3713140.0, ans=0.0 2023-11-28 23:23:00,910 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3713140.0, ans=10.0 2023-11-28 23:23:24,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3713273.3333333335, ans=0.125 2023-11-28 23:23:33,721 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 557000 2023-11-28 23:23:38,036 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 3900, loss[loss=0.06456, simple_loss=0.09518, pruned_loss=0.01073, audio_tagging_loss=0.00624, over 14128.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.08956, pruned_loss=0.01218, audio_tagging_loss=0.008603, over 3042088.78 frames. ], batch size: 52, lr: 1.44e-03, grad_scale: 8.0 2023-11-28 23:23:40,564 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3713340.0, ans=0.125 2023-11-28 23:23:40,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3713340.0, ans=0.0 2023-11-28 23:23:40,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3713340.0, ans=0.0 2023-11-28 23:23:55,797 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.114e+01 8.938e+01 9.522e+01 1.035e+02 1.409e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-28 23:23:59,051 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3713406.6666666665, ans=0.0 2023-11-28 23:24:14,927 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3713540.0, ans=0.0 2023-11-28 23:24:19,762 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.90 vs. limit=15.0 2023-11-28 23:24:20,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3713540.0, ans=0.0 2023-11-28 23:24:34,886 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 557050 2023-11-28 23:24:38,274 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 3950, loss[loss=0.06902, simple_loss=0.09081, pruned_loss=0.01097, audio_tagging_loss=0.01264, over 14537.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08892, pruned_loss=0.01198, audio_tagging_loss=0.008803, over 3043143.71 frames. ], batch size: 53, lr: 1.44e-03, grad_scale: 8.0 2023-11-28 23:24:48,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3713673.3333333335, ans=0.125 2023-11-28 23:24:52,782 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.76 vs. limit=22.5 2023-11-28 23:25:05,207 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3713806.6666666665, ans=0.125 2023-11-28 23:25:34,505 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3713940.0, ans=0.125 2023-11-28 23:25:37,824 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 557100 2023-11-28 23:25:41,351 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 4000, loss[loss=0.05868, simple_loss=0.07991, pruned_loss=0.006747, audio_tagging_loss=0.01198, over 14886.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.0896, pruned_loss=0.01197, audio_tagging_loss=0.008887, over 3040832.62 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:25:45,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3714006.6666666665, ans=0.125 2023-11-28 23:25:59,957 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.580e+01 8.920e+01 9.493e+01 1.035e+02 1.641e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-28 23:26:00,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3714073.3333333335, ans=0.0 2023-11-28 23:26:38,398 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.43 vs. limit=15.0 2023-11-28 23:26:38,960 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 557150 2023-11-28 23:26:43,004 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 4050, loss[loss=0.068, simple_loss=0.08714, pruned_loss=0.01586, audio_tagging_loss=0.008569, over 14649.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.08957, pruned_loss=0.01194, audio_tagging_loss=0.008827, over 3042467.21 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:26:47,714 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 23:26:55,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3714406.6666666665, ans=0.1 2023-11-28 23:27:05,036 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.64 vs. limit=15.0 2023-11-28 23:27:15,179 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.10 vs. limit=12.0 2023-11-28 23:27:16,706 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.73 vs. limit=22.5 2023-11-28 23:27:41,487 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 557200 2023-11-28 23:27:44,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3714673.3333333335, ans=0.0 2023-11-28 23:27:45,252 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 4100, loss[loss=0.06789, simple_loss=0.09685, pruned_loss=0.01026, audio_tagging_loss=0.009204, over 15056.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08947, pruned_loss=0.01187, audio_tagging_loss=0.00878, over 3046599.91 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:27:47,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3714673.3333333335, ans=0.125 2023-11-28 23:27:55,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3714673.3333333335, ans=0.125 2023-11-28 23:27:56,789 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3714740.0, ans=0.0 2023-11-28 23:28:03,221 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.823e+01 9.138e+01 9.541e+01 1.028e+02 1.498e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-28 23:28:21,319 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 23:28:42,422 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3714940.0, ans=0.1 2023-11-28 23:28:43,505 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 557250 2023-11-28 23:28:46,832 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 4150, loss[loss=0.05826, simple_loss=0.08292, pruned_loss=0.007945, audio_tagging_loss=0.008854, over 15004.00 frames. ], tot_loss[loss=0.0658, simple_loss=0.09021, pruned_loss=0.01203, audio_tagging_loss=0.008663, over 3044359.60 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:28:50,513 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3715006.6666666665, ans=0.035 2023-11-28 23:29:03,471 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.29 vs. limit=15.0 2023-11-28 23:29:09,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3715073.3333333335, ans=0.0 2023-11-28 23:29:23,837 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3715206.6666666665, ans=0.125 2023-11-28 23:29:23,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3715206.6666666665, ans=0.125 2023-11-28 23:29:29,145 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3715206.6666666665, ans=0.0 2023-11-28 23:29:30,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3715206.6666666665, ans=0.1 2023-11-28 23:29:33,551 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 23:29:39,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3715273.3333333335, ans=0.2 2023-11-28 23:29:44,755 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 557300 2023-11-28 23:29:47,335 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 23:29:47,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3715340.0, ans=0.0 2023-11-28 23:29:48,201 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 4200, loss[loss=0.09447, simple_loss=0.1303, pruned_loss=0.02144, audio_tagging_loss=0.007893, over 16010.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.08952, pruned_loss=0.01196, audio_tagging_loss=0.008609, over 3039334.20 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:30:06,722 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.169e+01 8.859e+01 9.416e+01 1.036e+02 1.524e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-28 23:30:08,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3715406.6666666665, ans=0.2 2023-11-28 23:30:10,683 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3715406.6666666665, ans=0.125 2023-11-28 23:30:17,041 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3715473.3333333335, ans=0.125 2023-11-28 23:30:34,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3715540.0, ans=0.0 2023-11-28 23:30:34,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3715540.0, ans=0.1 2023-11-28 23:30:43,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3715606.6666666665, ans=0.125 2023-11-28 23:30:43,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3715606.6666666665, ans=0.0 2023-11-28 23:30:46,510 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 557350 2023-11-28 23:30:49,967 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 4250, loss[loss=0.05582, simple_loss=0.06892, pruned_loss=0.01188, audio_tagging_loss=0.009479, over 14484.00 frames. ], tot_loss[loss=0.06501, simple_loss=0.08917, pruned_loss=0.0119, audio_tagging_loss=0.008526, over 3042739.90 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:30:53,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3715673.3333333335, ans=0.0 2023-11-28 23:30:54,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3715673.3333333335, ans=0.0 2023-11-28 23:30:57,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3715673.3333333335, ans=0.2 2023-11-28 23:31:12,387 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3715740.0, ans=0.07 2023-11-28 23:31:12,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3715740.0, ans=10.0 2023-11-28 23:31:19,720 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3715806.6666666665, ans=0.125 2023-11-28 23:31:28,559 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3715873.3333333335, ans=0.1 2023-11-28 23:31:30,245 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.04 vs. limit=22.5 2023-11-28 23:31:32,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3715873.3333333335, ans=0.2 2023-11-28 23:31:33,039 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3715873.3333333335, ans=0.2 2023-11-28 23:31:47,698 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 557400 2023-11-28 23:31:51,616 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 4300, loss[loss=0.09234, simple_loss=0.1318, pruned_loss=0.02124, audio_tagging_loss=0.005199, over 15514.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.09093, pruned_loss=0.01208, audio_tagging_loss=0.008322, over 3041609.56 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:31:59,330 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.73 vs. limit=15.0 2023-11-28 23:32:09,691 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.291e+01 9.110e+01 9.607e+01 1.023e+02 1.243e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-28 23:32:21,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3716140.0, ans=0.125 2023-11-28 23:32:49,462 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 557450 2023-11-28 23:32:53,533 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 4350, loss[loss=0.05783, simple_loss=0.08298, pruned_loss=0.006777, audio_tagging_loss=0.009563, over 14203.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.09155, pruned_loss=0.01228, audio_tagging_loss=0.00828, over 3048472.18 frames. ], batch size: 53, lr: 1.44e-03, grad_scale: 8.0 2023-11-28 23:32:56,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3716340.0, ans=0.125 2023-11-28 23:33:00,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3716340.0, ans=0.125 2023-11-28 23:33:01,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3716340.0, ans=0.1 2023-11-28 23:33:13,781 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.29 vs. limit=22.5 2023-11-28 23:33:52,185 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 557500 2023-11-28 23:33:55,554 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 4400, loss[loss=0.1061, simple_loss=0.1391, pruned_loss=0.02867, audio_tagging_loss=0.007871, over 14971.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.0908, pruned_loss=0.01232, audio_tagging_loss=0.008349, over 3044730.07 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:34:07,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3716740.0, ans=0.04949747468305833 2023-11-28 23:34:15,635 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.931e+01 9.001e+01 9.645e+01 1.064e+02 1.630e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-28 23:34:18,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3716740.0, ans=0.0 2023-11-28 23:34:24,204 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3716806.6666666665, ans=0.125 2023-11-28 23:34:29,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3716806.6666666665, ans=0.0 2023-11-28 23:34:31,745 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3716873.3333333335, ans=0.125 2023-11-28 23:34:40,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3716873.3333333335, ans=0.125 2023-11-28 23:34:54,011 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 557550 2023-11-28 23:34:57,366 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 4450, loss[loss=0.07747, simple_loss=0.1123, pruned_loss=0.01499, audio_tagging_loss=0.006345, over 15447.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.0907, pruned_loss=0.01223, audio_tagging_loss=0.008339, over 3041110.32 frames. ], batch size: 54, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:34:58,367 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 23:35:16,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3717073.3333333335, ans=0.0 2023-11-28 23:35:30,640 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.63 vs. limit=15.0 2023-11-28 23:35:47,077 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.28 vs. limit=15.0 2023-11-28 23:35:55,835 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 557600 2023-11-28 23:36:00,226 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 4500, loss[loss=0.06012, simple_loss=0.08662, pruned_loss=0.008727, audio_tagging_loss=0.008082, over 15132.00 frames. ], tot_loss[loss=0.06531, simple_loss=0.09005, pruned_loss=0.01191, audio_tagging_loss=0.008368, over 3044694.29 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:36:19,792 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.691e+01 8.979e+01 9.760e+01 1.042e+02 1.445e+02, threshold=1.952e+02, percent-clipped=0.0 2023-11-28 23:36:28,441 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3717473.3333333335, ans=0.125 2023-11-28 23:36:34,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3717473.3333333335, ans=0.125 2023-11-28 23:36:44,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3717540.0, ans=0.2 2023-11-28 23:36:47,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3717540.0, ans=0.125 2023-11-28 23:36:49,007 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3717606.6666666665, ans=0.125 2023-11-28 23:36:57,893 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.93 vs. limit=15.0 2023-11-28 23:36:58,534 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 557650 2023-11-28 23:37:01,997 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 4550, loss[loss=0.06189, simple_loss=0.08172, pruned_loss=0.01106, audio_tagging_loss=0.009972, over 15600.00 frames. ], tot_loss[loss=0.06437, simple_loss=0.08846, pruned_loss=0.01166, audio_tagging_loss=0.008483, over 3045876.75 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:37:08,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3717673.3333333335, ans=0.125 2023-11-28 23:37:10,634 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.35 vs. limit=22.5 2023-11-28 23:37:12,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3717740.0, ans=0.125 2023-11-28 23:37:46,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3717873.3333333335, ans=0.125 2023-11-28 23:37:47,798 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.01 vs. limit=15.0 2023-11-28 23:37:50,814 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 23:37:59,041 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 557700 2023-11-28 23:38:01,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3718006.6666666665, ans=0.0 2023-11-28 23:38:02,412 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 4600, loss[loss=0.06373, simple_loss=0.07899, pruned_loss=0.01454, audio_tagging_loss=0.009693, over 14917.00 frames. ], tot_loss[loss=0.06455, simple_loss=0.08854, pruned_loss=0.01176, audio_tagging_loss=0.008522, over 3040334.13 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:38:10,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3718006.6666666665, ans=0.125 2023-11-28 23:38:13,186 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 23:38:21,208 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.41 vs. limit=22.5 2023-11-28 23:38:22,951 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.797e+01 9.011e+01 9.487e+01 1.017e+02 1.254e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-28 23:38:48,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3718206.6666666665, ans=0.2 2023-11-28 23:39:01,163 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 557750 2023-11-28 23:39:04,627 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 4650, loss[loss=0.07414, simple_loss=0.1051, pruned_loss=0.01219, audio_tagging_loss=0.009386, over 15207.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.08927, pruned_loss=0.01198, audio_tagging_loss=0.008576, over 3039765.90 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:39:23,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3718406.6666666665, ans=0.125 2023-11-28 23:39:28,212 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3718473.3333333335, ans=0.125 2023-11-28 23:39:53,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3718606.6666666665, ans=0.0 2023-11-28 23:39:57,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3718606.6666666665, ans=0.0 2023-11-28 23:39:58,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3718606.6666666665, ans=0.125 2023-11-28 23:40:03,787 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 557800 2023-11-28 23:40:07,629 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 4700, loss[loss=0.05646, simple_loss=0.07137, pruned_loss=0.01047, audio_tagging_loss=0.0103, over 14561.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.09044, pruned_loss=0.01223, audio_tagging_loss=0.008709, over 3049080.87 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:40:23,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3718740.0, ans=0.0 2023-11-28 23:40:25,993 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.766e+01 9.230e+01 9.778e+01 1.067e+02 1.457e+02, threshold=1.956e+02, percent-clipped=0.0 2023-11-28 23:40:42,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3718806.6666666665, ans=0.125 2023-11-28 23:40:52,240 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3718873.3333333335, ans=0.0 2023-11-28 23:41:01,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3718940.0, ans=0.035 2023-11-28 23:41:04,891 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 557850 2023-11-28 23:41:08,268 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 4750, loss[loss=0.04947, simple_loss=0.06203, pruned_loss=0.007286, audio_tagging_loss=0.01117, over 14855.00 frames. ], tot_loss[loss=0.06595, simple_loss=0.09005, pruned_loss=0.01213, audio_tagging_loss=0.008801, over 3048822.62 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:41:18,499 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3719006.6666666665, ans=0.0 2023-11-28 23:41:22,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3719073.3333333335, ans=0.04949747468305833 2023-11-28 23:41:24,199 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.98 vs. limit=15.0 2023-11-28 23:41:24,919 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3719073.3333333335, ans=0.125 2023-11-28 23:41:36,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3719140.0, ans=0.1 2023-11-28 23:41:46,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3719206.6666666665, ans=0.125 2023-11-28 23:41:46,646 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.48 vs. limit=22.5 2023-11-28 23:41:49,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3719206.6666666665, ans=0.1 2023-11-28 23:42:06,289 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 557900 2023-11-28 23:42:10,522 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 4800, loss[loss=0.05034, simple_loss=0.05964, pruned_loss=0.007653, audio_tagging_loss=0.01287, over 15846.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.09096, pruned_loss=0.01216, audio_tagging_loss=0.00887, over 3053250.27 frames. ], batch size: 64, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 23:42:20,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3719340.0, ans=0.2 2023-11-28 23:42:23,941 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.37 vs. limit=15.0 2023-11-28 23:42:30,260 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.346e+01 9.011e+01 9.522e+01 1.013e+02 1.336e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-28 23:43:09,175 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 557950 2023-11-28 23:43:09,309 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3719606.6666666665, ans=0.0 2023-11-28 23:43:12,642 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 4850, loss[loss=0.06516, simple_loss=0.0835, pruned_loss=0.01439, audio_tagging_loss=0.009017, over 15917.00 frames. ], tot_loss[loss=0.06624, simple_loss=0.09023, pruned_loss=0.01214, audio_tagging_loss=0.008985, over 3057012.54 frames. ], batch size: 60, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 23:43:29,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3719740.0, ans=0.125 2023-11-28 23:43:40,828 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3719806.6666666665, ans=0.2 2023-11-28 23:43:58,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3719873.3333333335, ans=0.1 2023-11-28 23:44:09,062 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.92 vs. limit=5.0 2023-11-28 23:44:10,653 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 558000 2023-11-28 23:44:14,579 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 4900, loss[loss=0.07102, simple_loss=0.1032, pruned_loss=0.01248, audio_tagging_loss=0.006919, over 14417.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.09007, pruned_loss=0.01218, audio_tagging_loss=0.008958, over 3050122.19 frames. ], batch size: 53, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:44:35,706 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.304e+01 8.818e+01 9.390e+01 1.021e+02 1.310e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-28 23:44:36,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn2.whiten.whitening_limit, batch_count=3720073.3333333335, ans=22.5 2023-11-28 23:44:40,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3720140.0, ans=0.125 2023-11-28 23:44:45,921 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.27 vs. limit=6.0 2023-11-28 23:45:12,793 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 558050 2023-11-28 23:45:16,110 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 4950, loss[loss=0.05639, simple_loss=0.0844, pruned_loss=0.007288, audio_tagging_loss=0.006899, over 14910.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08954, pruned_loss=0.01209, audio_tagging_loss=0.008772, over 3041843.58 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:45:25,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3720340.0, ans=0.125 2023-11-28 23:45:25,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3720340.0, ans=0.0 2023-11-28 23:45:35,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3720406.6666666665, ans=0.125 2023-11-28 23:45:38,806 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3720406.6666666665, ans=0.0 2023-11-28 23:45:56,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3720540.0, ans=0.1 2023-11-28 23:46:01,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3720540.0, ans=0.0 2023-11-28 23:46:05,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3720606.6666666665, ans=0.04949747468305833 2023-11-28 23:46:07,976 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.01 vs. limit=10.0 2023-11-28 23:46:14,355 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 558100 2023-11-28 23:46:15,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3720606.6666666665, ans=0.0 2023-11-28 23:46:18,293 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 5000, loss[loss=0.06092, simple_loss=0.08363, pruned_loss=0.01089, audio_tagging_loss=0.008218, over 14558.00 frames. ], tot_loss[loss=0.06499, simple_loss=0.08868, pruned_loss=0.01197, audio_tagging_loss=0.008683, over 3041190.81 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:46:31,628 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 23:46:31,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3720740.0, ans=0.2 2023-11-28 23:46:38,219 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.642e+01 8.963e+01 9.566e+01 1.007e+02 2.358e+02, threshold=1.913e+02, percent-clipped=1.0 2023-11-28 23:46:47,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3720806.6666666665, ans=0.0 2023-11-28 23:46:56,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3720873.3333333335, ans=0.125 2023-11-28 23:47:15,724 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 558150 2023-11-28 23:47:15,906 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3720940.0, ans=0.0 2023-11-28 23:47:19,193 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 5050, loss[loss=0.04484, simple_loss=0.05327, pruned_loss=0.006714, audio_tagging_loss=0.0115, over 15610.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.08884, pruned_loss=0.0121, audio_tagging_loss=0.008597, over 3041236.67 frames. ], batch size: 63, lr: 1.44e-03, grad_scale: 8.0 2023-11-28 23:47:33,498 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.77 vs. limit=15.0 2023-11-28 23:47:41,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3721073.3333333335, ans=0.125 2023-11-28 23:48:16,744 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 558200 2023-11-28 23:48:21,051 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 5100, loss[loss=0.05402, simple_loss=0.06093, pruned_loss=0.009078, audio_tagging_loss=0.01448, over 14596.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.08829, pruned_loss=0.01204, audio_tagging_loss=0.008722, over 3033683.47 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 8.0 2023-11-28 23:48:24,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3721340.0, ans=0.125 2023-11-28 23:48:36,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3721406.6666666665, ans=0.1 2023-11-28 23:48:44,108 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.467e+01 8.897e+01 9.648e+01 1.044e+02 1.353e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-28 23:48:55,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3721473.3333333335, ans=0.125 2023-11-28 23:48:58,228 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3721540.0, ans=0.125 2023-11-28 23:49:18,610 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 558250 2023-11-28 23:49:21,919 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 5150, loss[loss=0.05485, simple_loss=0.07368, pruned_loss=0.007394, audio_tagging_loss=0.01062, over 14720.00 frames. ], tot_loss[loss=0.06456, simple_loss=0.08753, pruned_loss=0.012, audio_tagging_loss=0.008796, over 3038868.97 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 8.0 2023-11-28 23:49:45,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3721740.0, ans=0.0 2023-11-28 23:49:52,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3721806.6666666665, ans=0.0 2023-11-28 23:50:05,373 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.73 vs. limit=15.0 2023-11-28 23:50:20,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3721940.0, ans=0.125 2023-11-28 23:50:21,678 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 558300 2023-11-28 23:50:25,160 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 5200, loss[loss=0.06143, simple_loss=0.08337, pruned_loss=0.00902, audio_tagging_loss=0.01073, over 14831.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08889, pruned_loss=0.01212, audio_tagging_loss=0.008675, over 3037472.10 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:50:28,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3722006.6666666665, ans=0.125 2023-11-28 23:50:46,817 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.382e+01 9.041e+01 9.653e+01 1.034e+02 1.419e+02, threshold=1.931e+02, percent-clipped=0.0 2023-11-28 23:50:57,295 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=17.84 vs. limit=15.0 2023-11-28 23:51:01,218 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3722206.6666666665, ans=0.125 2023-11-28 23:51:04,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3722206.6666666665, ans=0.125 2023-11-28 23:51:07,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3722206.6666666665, ans=0.04949747468305833 2023-11-28 23:51:11,471 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.29 vs. limit=10.0 2023-11-28 23:51:22,587 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 558350 2023-11-28 23:51:26,658 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 5250, loss[loss=0.05808, simple_loss=0.07649, pruned_loss=0.01126, audio_tagging_loss=0.008578, over 15346.00 frames. ], tot_loss[loss=0.06501, simple_loss=0.08858, pruned_loss=0.01213, audio_tagging_loss=0.008588, over 3033057.10 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:51:38,853 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3722406.6666666665, ans=0.2 2023-11-28 23:51:43,165 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.08 vs. limit=12.0 2023-11-28 23:51:44,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3722406.6666666665, ans=0.125 2023-11-28 23:51:47,174 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.75 vs. limit=12.0 2023-11-28 23:52:10,291 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.62 vs. limit=22.5 2023-11-28 23:52:16,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3722606.6666666665, ans=0.0 2023-11-28 23:52:24,366 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 558400 2023-11-28 23:52:24,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3722606.6666666665, ans=0.0 2023-11-28 23:52:28,351 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 5300, loss[loss=0.07423, simple_loss=0.09929, pruned_loss=0.0157, audio_tagging_loss=0.008888, over 15135.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.08966, pruned_loss=0.01237, audio_tagging_loss=0.0085, over 3035491.30 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:52:31,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3722673.3333333335, ans=0.125 2023-11-28 23:52:36,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3722673.3333333335, ans=0.125 2023-11-28 23:52:36,540 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.36 vs. limit=12.0 2023-11-28 23:52:50,446 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.846e+01 9.120e+01 9.836e+01 1.047e+02 1.238e+02, threshold=1.967e+02, percent-clipped=0.0 2023-11-28 23:53:04,894 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.95 vs. limit=6.0 2023-11-28 23:53:26,171 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 558450 2023-11-28 23:53:26,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3722940.0, ans=0.0 2023-11-28 23:53:28,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3722940.0, ans=0.0 2023-11-28 23:53:30,166 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 5350, loss[loss=0.08582, simple_loss=0.1236, pruned_loss=0.01758, audio_tagging_loss=0.006434, over 15298.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.08977, pruned_loss=0.01243, audio_tagging_loss=0.008476, over 3032378.80 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:53:44,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3723073.3333333335, ans=0.0 2023-11-28 23:54:17,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3723206.6666666665, ans=0.2 2023-11-28 23:54:28,022 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 558500 2023-11-28 23:54:31,519 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 5400, loss[loss=0.08844, simple_loss=0.1284, pruned_loss=0.01654, audio_tagging_loss=0.00773, over 15716.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.08969, pruned_loss=0.01242, audio_tagging_loss=0.008481, over 3044368.54 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:54:34,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3723340.0, ans=0.1 2023-11-28 23:54:43,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3723406.6666666665, ans=0.125 2023-11-28 23:54:54,642 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.310e+01 8.983e+01 9.673e+01 1.019e+02 1.246e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-28 23:55:08,928 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.47 vs. limit=12.0 2023-11-28 23:55:29,976 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 558550 2023-11-28 23:55:33,321 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 5450, loss[loss=0.06502, simple_loss=0.08783, pruned_loss=0.01103, audio_tagging_loss=0.01008, over 14368.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08973, pruned_loss=0.01222, audio_tagging_loss=0.008545, over 3048354.40 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:55:34,003 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.01 vs. limit=15.0 2023-11-28 23:55:40,546 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3723673.3333333335, ans=0.125 2023-11-28 23:55:48,585 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3723740.0, ans=0.0 2023-11-28 23:55:57,254 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3723806.6666666665, ans=0.1 2023-11-28 23:56:31,918 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 558600 2023-11-28 23:56:34,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3724006.6666666665, ans=0.125 2023-11-28 23:56:34,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3724006.6666666665, ans=0.125 2023-11-28 23:56:35,653 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 5500, loss[loss=0.05224, simple_loss=0.06899, pruned_loss=0.009275, audio_tagging_loss=0.008474, over 14989.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.08947, pruned_loss=0.01225, audio_tagging_loss=0.008584, over 3043475.67 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:56:44,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3724006.6666666665, ans=0.1 2023-11-28 23:56:57,822 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.947e+01 8.978e+01 9.679e+01 1.033e+02 1.249e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-28 23:56:58,668 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.44 vs. limit=15.0 2023-11-28 23:57:00,727 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.16 vs. limit=22.5 2023-11-28 23:57:15,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3724206.6666666665, ans=0.0 2023-11-28 23:57:33,461 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 558650 2023-11-28 23:57:36,766 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 5550, loss[loss=0.0638, simple_loss=0.09119, pruned_loss=0.01193, audio_tagging_loss=0.006268, over 15188.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.0887, pruned_loss=0.01213, audio_tagging_loss=0.008649, over 3047285.89 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:57:50,074 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.89 vs. limit=15.0 2023-11-28 23:58:29,659 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.05 vs. limit=15.0 2023-11-28 23:58:35,121 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 558700 2023-11-28 23:58:38,540 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 5600, loss[loss=0.06943, simple_loss=0.09078, pruned_loss=0.01519, audio_tagging_loss=0.008851, over 14583.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.08936, pruned_loss=0.01198, audio_tagging_loss=0.008802, over 3050245.49 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 23:58:56,522 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3724740.0, ans=0.125 2023-11-28 23:59:00,847 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.986e+01 9.219e+01 9.778e+01 1.037e+02 1.295e+02, threshold=1.956e+02, percent-clipped=0.0 2023-11-28 23:59:05,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3724806.6666666665, ans=0.125 2023-11-28 23:59:21,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3724873.3333333335, ans=0.125 2023-11-28 23:59:24,252 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 23:59:26,865 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3724940.0, ans=0.1 2023-11-28 23:59:30,890 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-28 23:59:31,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3724940.0, ans=0.125 2023-11-28 23:59:37,050 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 558750 2023-11-28 23:59:40,499 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 5650, loss[loss=0.05426, simple_loss=0.07365, pruned_loss=0.008709, audio_tagging_loss=0.008725, over 14824.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.0901, pruned_loss=0.01206, audio_tagging_loss=0.008791, over 3058350.43 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 23:59:46,930 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.56 vs. limit=15.0 2023-11-28 23:59:48,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3725006.6666666665, ans=0.125 2023-11-29 00:00:11,261 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3725140.0, ans=0.125 2023-11-29 00:00:12,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3725140.0, ans=0.125 2023-11-29 00:00:22,929 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3725206.6666666665, ans=0.2 2023-11-29 00:00:34,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3725273.3333333335, ans=0.1 2023-11-29 00:00:37,881 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 558800 2023-11-29 00:00:41,907 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 5700, loss[loss=0.06465, simple_loss=0.09152, pruned_loss=0.01144, audio_tagging_loss=0.007451, over 14654.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08947, pruned_loss=0.01193, audio_tagging_loss=0.008826, over 3050780.65 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 32.0 2023-11-29 00:01:02,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3725406.6666666665, ans=0.0 2023-11-29 00:01:04,873 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.325e+01 8.841e+01 9.405e+01 1.014e+02 1.366e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-29 00:01:08,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3725473.3333333335, ans=0.125 2023-11-29 00:01:17,376 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 00:01:21,548 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.78 vs. limit=15.0 2023-11-29 00:01:29,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3725540.0, ans=0.2 2023-11-29 00:01:30,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3725606.6666666665, ans=0.125 2023-11-29 00:01:35,040 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.06 vs. limit=15.0 2023-11-29 00:01:41,171 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 558850 2023-11-29 00:01:43,595 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3725673.3333333335, ans=0.125 2023-11-29 00:01:44,643 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 5750, loss[loss=0.06948, simple_loss=0.09891, pruned_loss=0.01199, audio_tagging_loss=0.008026, over 15243.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.08914, pruned_loss=0.01193, audio_tagging_loss=0.008686, over 3045652.97 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 32.0 2023-11-29 00:02:03,983 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.28 vs. limit=22.5 2023-11-29 00:02:04,676 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3725740.0, ans=0.125 2023-11-29 00:02:04,676 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3725740.0, ans=0.0 2023-11-29 00:02:05,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3725740.0, ans=0.2 2023-11-29 00:02:23,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3725873.3333333335, ans=0.5 2023-11-29 00:02:23,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3725873.3333333335, ans=0.125 2023-11-29 00:02:34,075 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.74 vs. limit=15.0 2023-11-29 00:02:41,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3725940.0, ans=0.125 2023-11-29 00:02:42,738 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 558900 2023-11-29 00:02:42,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3725940.0, ans=0.125 2023-11-29 00:02:46,202 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 5800, loss[loss=0.07996, simple_loss=0.1197, pruned_loss=0.01243, audio_tagging_loss=0.007701, over 16338.00 frames. ], tot_loss[loss=0.06467, simple_loss=0.08843, pruned_loss=0.0118, audio_tagging_loss=0.00866, over 3044672.05 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-29 00:03:08,465 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.900e+01 8.867e+01 9.470e+01 1.000e+02 1.681e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-29 00:03:13,493 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.71 vs. limit=15.0 2023-11-29 00:03:17,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3726140.0, ans=0.2 2023-11-29 00:03:42,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3726273.3333333335, ans=0.125 2023-11-29 00:03:43,106 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 558950 2023-11-29 00:03:46,505 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 5850, loss[loss=0.07313, simple_loss=0.1004, pruned_loss=0.01667, audio_tagging_loss=0.006279, over 14370.00 frames. ], tot_loss[loss=0.06446, simple_loss=0.088, pruned_loss=0.01184, audio_tagging_loss=0.008614, over 3038789.05 frames. ], batch size: 53, lr: 1.44e-03, grad_scale: 16.0 2023-11-29 00:03:50,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3726340.0, ans=0.0 2023-11-29 00:04:29,055 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.85 vs. limit=6.0 2023-11-29 00:04:35,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3726606.6666666665, ans=0.1 2023-11-29 00:04:44,588 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 559000 2023-11-29 00:04:47,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3726673.3333333335, ans=0.5 2023-11-29 00:04:49,138 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 5900, loss[loss=0.05498, simple_loss=0.07243, pruned_loss=0.01058, audio_tagging_loss=0.00818, over 15785.00 frames. ], tot_loss[loss=0.06437, simple_loss=0.08807, pruned_loss=0.0118, audio_tagging_loss=0.008537, over 3043993.31 frames. ], batch size: 60, lr: 1.44e-03, grad_scale: 16.0 2023-11-29 00:04:52,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3726673.3333333335, ans=0.0 2023-11-29 00:05:04,654 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3726740.0, ans=0.125 2023-11-29 00:05:08,609 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.70 vs. limit=22.5 2023-11-29 00:05:12,434 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.646e+01 8.979e+01 9.571e+01 1.024e+02 1.288e+02, threshold=1.914e+02, percent-clipped=0.0 2023-11-29 00:05:12,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3726806.6666666665, ans=0.125 2023-11-29 00:05:42,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3726940.0, ans=0.125 2023-11-29 00:05:47,312 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 559050 2023-11-29 00:05:51,238 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 5950, loss[loss=0.06168, simple_loss=0.07567, pruned_loss=0.01508, audio_tagging_loss=0.008772, over 13938.00 frames. ], tot_loss[loss=0.06461, simple_loss=0.08825, pruned_loss=0.01193, audio_tagging_loss=0.008555, over 3049251.82 frames. ], batch size: 54, lr: 1.44e-03, grad_scale: 16.0 2023-11-29 00:06:01,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3727073.3333333335, ans=0.09899494936611666 2023-11-29 00:06:11,533 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.20 vs. limit=12.0 2023-11-29 00:06:21,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3727140.0, ans=0.0 2023-11-29 00:06:33,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3727206.6666666665, ans=0.2 2023-11-29 00:06:48,360 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 559100 2023-11-29 00:06:51,704 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 6000, loss[loss=0.04842, simple_loss=0.06155, pruned_loss=0.008149, audio_tagging_loss=0.009498, over 14835.00 frames. ], tot_loss[loss=0.06476, simple_loss=0.08863, pruned_loss=0.01193, audio_tagging_loss=0.008516, over 3046723.70 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 32.0 2023-11-29 00:06:51,705 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-29 00:07:31,870 INFO [train_asr.py:1267] (1/4) Epoch 47, validation: loss=0.05752, simple_loss=0.05049, pruned_loss=0.005333, audio_tagging_loss=0.02694, over 4681554.00 frames. 2023-11-29 00:07:31,870 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-29 00:07:51,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3727406.6666666665, ans=0.1 2023-11-29 00:07:56,051 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.777e+01 9.062e+01 9.671e+01 1.050e+02 2.392e+02, threshold=1.934e+02, percent-clipped=1.0 2023-11-29 00:08:17,555 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 00:08:27,780 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 00:08:31,038 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 559150 2023-11-29 00:08:34,404 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 6050, loss[loss=0.05656, simple_loss=0.07553, pruned_loss=0.009062, audio_tagging_loss=0.009738, over 16054.00 frames. ], tot_loss[loss=0.06474, simple_loss=0.08871, pruned_loss=0.01191, audio_tagging_loss=0.008481, over 3052462.60 frames. ], batch size: 61, lr: 1.44e-03, grad_scale: 32.0 2023-11-29 00:08:37,107 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3727673.3333333335, ans=0.125 2023-11-29 00:08:59,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3727806.6666666665, ans=0.125 2023-11-29 00:08:59,618 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.51 vs. limit=15.0 2023-11-29 00:09:08,167 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.85 vs. limit=12.0 2023-11-29 00:09:26,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3727940.0, ans=0.125 2023-11-29 00:09:31,167 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 559200 2023-11-29 00:09:34,944 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 6100, loss[loss=0.0908, simple_loss=0.1189, pruned_loss=0.02318, audio_tagging_loss=0.008198, over 14597.00 frames. ], tot_loss[loss=0.06449, simple_loss=0.08808, pruned_loss=0.01193, audio_tagging_loss=0.008519, over 3051286.05 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 32.0 2023-11-29 00:09:43,732 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.28 vs. limit=22.5 2023-11-29 00:09:45,746 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3728073.3333333335, ans=0.125 2023-11-29 00:09:46,814 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3728073.3333333335, ans=0.125 2023-11-29 00:09:56,964 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.69 vs. limit=22.5 2023-11-29 00:09:58,312 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.264e+01 8.989e+01 9.555e+01 1.035e+02 1.326e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-29 00:10:14,158 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3728206.6666666665, ans=0.1 2023-11-29 00:10:14,161 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3728206.6666666665, ans=0.0 2023-11-29 00:10:18,832 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3728206.6666666665, ans=0.1 2023-11-29 00:10:25,937 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3728273.3333333335, ans=0.1 2023-11-29 00:10:31,609 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 559250 2023-11-29 00:10:35,635 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 6150, loss[loss=0.06197, simple_loss=0.08611, pruned_loss=0.01057, audio_tagging_loss=0.008338, over 14820.00 frames. ], tot_loss[loss=0.06426, simple_loss=0.08787, pruned_loss=0.01179, audio_tagging_loss=0.008539, over 3044850.59 frames. ], batch size: 54, lr: 1.44e-03, grad_scale: 16.0 2023-11-29 00:11:06,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3728473.3333333335, ans=0.2 2023-11-29 00:11:17,357 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.58 vs. limit=15.0 2023-11-29 00:11:33,911 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 559300 2023-11-29 00:11:38,012 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 6200, loss[loss=0.06864, simple_loss=0.0923, pruned_loss=0.01472, audio_tagging_loss=0.00777, over 15142.00 frames. ], tot_loss[loss=0.06412, simple_loss=0.08752, pruned_loss=0.01176, audio_tagging_loss=0.008609, over 3038221.93 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-29 00:11:38,434 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3728673.3333333335, ans=0.2 2023-11-29 00:11:39,767 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.20 vs. limit=22.5 2023-11-29 00:11:46,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3728673.3333333335, ans=0.125 2023-11-29 00:11:48,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3728740.0, ans=0.125 2023-11-29 00:12:01,263 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.926e+01 8.945e+01 9.631e+01 1.031e+02 1.323e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-29 00:12:06,413 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3728806.6666666665, ans=0.1 2023-11-29 00:12:35,573 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 559350 2023-11-29 00:12:36,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3728940.0, ans=0.125 2023-11-29 00:12:39,046 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 6250, loss[loss=0.07825, simple_loss=0.1111, pruned_loss=0.01432, audio_tagging_loss=0.008365, over 15274.00 frames. ], tot_loss[loss=0.06401, simple_loss=0.08724, pruned_loss=0.01172, audio_tagging_loss=0.008673, over 3035689.52 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-29 00:12:49,766 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3729073.3333333335, ans=0.1 2023-11-29 00:13:33,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3729273.3333333335, ans=0.125 2023-11-29 00:13:36,082 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 559400 2023-11-29 00:13:39,777 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 6300, loss[loss=0.08088, simple_loss=0.1114, pruned_loss=0.01871, audio_tagging_loss=0.006481, over 15803.00 frames. ], tot_loss[loss=0.06485, simple_loss=0.08851, pruned_loss=0.01196, audio_tagging_loss=0.008645, over 3038480.70 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 16.0 2023-11-29 00:13:39,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3729340.0, ans=0.125 2023-11-29 00:13:44,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3729340.0, ans=0.125 2023-11-29 00:13:47,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3729340.0, ans=0.0 2023-11-29 00:13:56,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3729406.6666666665, ans=0.0 2023-11-29 00:14:06,085 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.394e+01 8.910e+01 9.740e+01 1.040e+02 1.205e+02, threshold=1.948e+02, percent-clipped=0.0 2023-11-29 00:14:18,674 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 00:14:30,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3729606.6666666665, ans=0.125 2023-11-29 00:14:35,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3729606.6666666665, ans=0.125 2023-11-29 00:14:39,765 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 559450 2023-11-29 00:14:43,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3729673.3333333335, ans=0.0 2023-11-29 00:14:43,110 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3729673.3333333335, ans=0.1 2023-11-29 00:14:43,934 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 6350, loss[loss=0.06385, simple_loss=0.08247, pruned_loss=0.011, audio_tagging_loss=0.01162, over 15132.00 frames. ], tot_loss[loss=0.06429, simple_loss=0.08727, pruned_loss=0.01183, audio_tagging_loss=0.008822, over 3038300.42 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-29 00:14:59,423 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3729740.0, ans=0.125 2023-11-29 00:15:07,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3729806.6666666665, ans=0.0 2023-11-29 00:15:30,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3729873.3333333335, ans=0.04949747468305833 2023-11-29 00:15:32,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3729940.0, ans=0.125 2023-11-29 00:15:32,858 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.72 vs. limit=10.0 2023-11-29 00:15:34,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3729940.0, ans=0.0 2023-11-29 00:15:34,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3729940.0, ans=0.2 2023-11-29 00:15:35,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3729940.0, ans=0.2 2023-11-29 00:15:38,169 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3729940.0, ans=0.125 2023-11-29 00:15:41,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3729940.0, ans=0.125 2023-11-29 00:15:42,323 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 559500 2023-11-29 00:15:42,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3729940.0, ans=0.0 2023-11-29 00:15:44,219 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.41 vs. limit=22.5 2023-11-29 00:15:45,788 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 6400, loss[loss=0.07703, simple_loss=0.1067, pruned_loss=0.01418, audio_tagging_loss=0.009481, over 15112.00 frames. ], tot_loss[loss=0.06497, simple_loss=0.08823, pruned_loss=0.01198, audio_tagging_loss=0.008876, over 3043777.26 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 00:15:46,041 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3730006.6666666665, ans=0.125 2023-11-29 00:16:08,731 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3730140.0, ans=0.1 2023-11-29 00:16:08,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3730140.0, ans=0.0 2023-11-29 00:16:10,476 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.56 vs. limit=10.0 2023-11-29 00:16:10,794 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.783e+01 8.936e+01 9.646e+01 1.038e+02 1.369e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-29 00:16:39,525 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.97 vs. limit=15.0 2023-11-29 00:16:43,613 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 559550 2023-11-29 00:16:47,009 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 6450, loss[loss=0.08011, simple_loss=0.1084, pruned_loss=0.01446, audio_tagging_loss=0.01147, over 15423.00 frames. ], tot_loss[loss=0.0651, simple_loss=0.08805, pruned_loss=0.01208, audio_tagging_loss=0.008996, over 3041088.90 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:17:43,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=3730606.6666666665, ans=22.5 2023-11-29 00:17:45,282 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3730606.6666666665, ans=0.2 2023-11-29 00:17:46,294 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 559600 2023-11-29 00:17:48,409 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.74 vs. limit=6.0 2023-11-29 00:17:49,674 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3730673.3333333335, ans=0.125 2023-11-29 00:17:50,653 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 6500, loss[loss=0.05524, simple_loss=0.07541, pruned_loss=0.008125, audio_tagging_loss=0.009413, over 15138.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.08896, pruned_loss=0.01219, audio_tagging_loss=0.008891, over 3044346.46 frames. ], batch size: 60, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:17:57,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3730673.3333333335, ans=0.1 2023-11-29 00:18:07,605 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 00:18:15,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3730806.6666666665, ans=0.125 2023-11-29 00:18:16,524 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.806e+01 9.149e+01 9.988e+01 1.072e+02 1.426e+02, threshold=1.998e+02, percent-clipped=0.0 2023-11-29 00:18:21,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3730806.6666666665, ans=0.0 2023-11-29 00:18:49,105 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 559650 2023-11-29 00:18:50,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3730940.0, ans=0.09899494936611666 2023-11-29 00:18:52,631 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 6550, loss[loss=0.08548, simple_loss=0.1263, pruned_loss=0.01656, audio_tagging_loss=0.005793, over 15632.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.08969, pruned_loss=0.01224, audio_tagging_loss=0.008694, over 3048182.81 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:19:18,906 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3731140.0, ans=0.125 2023-11-29 00:19:27,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3731140.0, ans=0.125 2023-11-29 00:19:29,983 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.05 vs. limit=15.0 2023-11-29 00:19:30,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3731206.6666666665, ans=0.2 2023-11-29 00:19:51,091 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 559700 2023-11-29 00:19:51,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3731273.3333333335, ans=0.025 2023-11-29 00:19:54,514 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 6600, loss[loss=0.05965, simple_loss=0.07886, pruned_loss=0.01275, audio_tagging_loss=0.007463, over 15825.00 frames. ], tot_loss[loss=0.06476, simple_loss=0.08841, pruned_loss=0.01198, audio_tagging_loss=0.008575, over 3047051.65 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:20:20,709 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.550e+01 8.919e+01 9.465e+01 1.014e+02 1.286e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-29 00:20:34,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3731540.0, ans=0.125 2023-11-29 00:20:52,638 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 559750 2023-11-29 00:20:56,672 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 6650, loss[loss=0.04419, simple_loss=0.05983, pruned_loss=0.005863, audio_tagging_loss=0.008415, over 15545.00 frames. ], tot_loss[loss=0.0639, simple_loss=0.08719, pruned_loss=0.01171, audio_tagging_loss=0.008592, over 3044088.88 frames. ], batch size: 60, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:21:10,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3731740.0, ans=0.1 2023-11-29 00:21:20,077 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.02 vs. limit=15.0 2023-11-29 00:21:46,551 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.12 vs. limit=12.0 2023-11-29 00:21:54,843 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 559800 2023-11-29 00:21:58,724 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 6700, loss[loss=0.07435, simple_loss=0.1091, pruned_loss=0.01282, audio_tagging_loss=0.006958, over 16055.00 frames. ], tot_loss[loss=0.06449, simple_loss=0.08831, pruned_loss=0.01186, audio_tagging_loss=0.008478, over 3045577.65 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:22:02,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3732006.6666666665, ans=0.125 2023-11-29 00:22:04,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3732006.6666666665, ans=0.125 2023-11-29 00:22:24,739 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.843e+01 9.131e+01 9.664e+01 1.036e+02 1.396e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-29 00:22:56,143 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 559850 2023-11-29 00:22:59,583 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 6750, loss[loss=0.0597, simple_loss=0.08308, pruned_loss=0.0104, audio_tagging_loss=0.007755, over 15023.00 frames. ], tot_loss[loss=0.06447, simple_loss=0.08823, pruned_loss=0.01188, audio_tagging_loss=0.008473, over 3034943.37 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:23:06,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3732340.0, ans=0.2 2023-11-29 00:23:07,898 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.69 vs. limit=22.5 2023-11-29 00:23:18,402 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.37 vs. limit=15.0 2023-11-29 00:23:25,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3732473.3333333335, ans=0.125 2023-11-29 00:23:27,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3732473.3333333335, ans=0.0 2023-11-29 00:23:48,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3732606.6666666665, ans=0.125 2023-11-29 00:23:58,410 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 559900 2023-11-29 00:23:59,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3732606.6666666665, ans=0.125 2023-11-29 00:24:01,808 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 6800, loss[loss=0.0593, simple_loss=0.07907, pruned_loss=0.01215, audio_tagging_loss=0.00762, over 14314.00 frames. ], tot_loss[loss=0.06449, simple_loss=0.08806, pruned_loss=0.01197, audio_tagging_loss=0.008496, over 3037695.57 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 00:24:03,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3732673.3333333335, ans=0.1 2023-11-29 00:24:08,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3732673.3333333335, ans=0.0 2023-11-29 00:24:22,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3732740.0, ans=0.125 2023-11-29 00:24:27,570 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.798e+01 9.091e+01 9.725e+01 1.038e+02 3.036e+02, threshold=1.945e+02, percent-clipped=1.0 2023-11-29 00:24:32,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3732806.6666666665, ans=10.0 2023-11-29 00:24:52,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3732940.0, ans=0.125 2023-11-29 00:24:55,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3732940.0, ans=0.125 2023-11-29 00:24:55,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3732940.0, ans=0.2 2023-11-29 00:25:00,705 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 559950 2023-11-29 00:25:00,895 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3732940.0, ans=0.0 2023-11-29 00:25:03,092 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3733006.6666666665, ans=0.0 2023-11-29 00:25:04,144 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 6850, loss[loss=0.05985, simple_loss=0.08461, pruned_loss=0.009151, audio_tagging_loss=0.008395, over 15304.00 frames. ], tot_loss[loss=0.06411, simple_loss=0.0875, pruned_loss=0.01185, audio_tagging_loss=0.008504, over 3031158.30 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:25:24,185 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3733073.3333333335, ans=0.015 2023-11-29 00:25:25,841 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.27 vs. limit=15.0 2023-11-29 00:25:31,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3733140.0, ans=0.1 2023-11-29 00:25:34,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3733140.0, ans=0.0 2023-11-29 00:25:40,517 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3733206.6666666665, ans=0.07 2023-11-29 00:25:44,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3733206.6666666665, ans=0.0 2023-11-29 00:25:48,122 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 00:26:02,180 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 560000 2023-11-29 00:26:07,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3733340.0, ans=0.0 2023-11-29 00:26:08,428 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 6900, loss[loss=0.07065, simple_loss=0.09838, pruned_loss=0.01329, audio_tagging_loss=0.008164, over 15588.00 frames. ], tot_loss[loss=0.06479, simple_loss=0.08885, pruned_loss=0.01195, audio_tagging_loss=0.00842, over 3043346.27 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:26:12,712 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.35 vs. limit=22.5 2023-11-29 00:26:36,703 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.618e+01 8.734e+01 9.536e+01 1.026e+02 1.241e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-29 00:26:38,215 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3733473.3333333335, ans=0.0 2023-11-29 00:26:46,893 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.14 vs. limit=15.0 2023-11-29 00:26:56,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3733606.6666666665, ans=0.2 2023-11-29 00:26:57,069 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.54 vs. limit=15.0 2023-11-29 00:26:57,770 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 00:27:00,261 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3733606.6666666665, ans=0.125 2023-11-29 00:27:07,519 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 560050 2023-11-29 00:27:07,735 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3733606.6666666665, ans=0.125 2023-11-29 00:27:10,746 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 6950, loss[loss=0.0639, simple_loss=0.08278, pruned_loss=0.01219, audio_tagging_loss=0.01032, over 15536.00 frames. ], tot_loss[loss=0.06463, simple_loss=0.08834, pruned_loss=0.01197, audio_tagging_loss=0.0085, over 3040982.86 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:27:21,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3733673.3333333335, ans=0.125 2023-11-29 00:27:25,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3733740.0, ans=0.125 2023-11-29 00:27:58,161 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3733873.3333333335, ans=0.0 2023-11-29 00:28:09,808 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 560100 2023-11-29 00:28:13,158 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 7000, loss[loss=0.08537, simple_loss=0.1197, pruned_loss=0.01399, audio_tagging_loss=0.01153, over 14889.00 frames. ], tot_loss[loss=0.0645, simple_loss=0.08806, pruned_loss=0.0119, audio_tagging_loss=0.008579, over 3037568.15 frames. ], batch size: 54, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:28:17,049 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3734006.6666666665, ans=0.0 2023-11-29 00:28:31,433 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.49 vs. limit=6.0 2023-11-29 00:28:38,961 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.539e+01 8.937e+01 9.480e+01 1.049e+02 1.230e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-29 00:28:44,331 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.97 vs. limit=15.0 2023-11-29 00:28:52,393 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3734206.6666666665, ans=0.125 2023-11-29 00:29:01,297 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3734273.3333333335, ans=0.125 2023-11-29 00:29:10,510 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 560150 2023-11-29 00:29:11,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3734273.3333333335, ans=0.125 2023-11-29 00:29:13,836 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 7050, loss[loss=0.0737, simple_loss=0.1073, pruned_loss=0.01267, audio_tagging_loss=0.007378, over 14995.00 frames. ], tot_loss[loss=0.06452, simple_loss=0.08797, pruned_loss=0.01185, audio_tagging_loss=0.008679, over 3042514.25 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:29:31,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3734406.6666666665, ans=0.125 2023-11-29 00:29:38,335 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.81 vs. limit=12.0 2023-11-29 00:29:38,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3734473.3333333335, ans=0.125 2023-11-29 00:29:40,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3734473.3333333335, ans=0.125 2023-11-29 00:29:45,940 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.70 vs. limit=22.5 2023-11-29 00:29:55,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3734540.0, ans=0.0 2023-11-29 00:30:01,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3734540.0, ans=0.07 2023-11-29 00:30:11,686 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 560200 2023-11-29 00:30:11,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3734606.6666666665, ans=0.2 2023-11-29 00:30:16,172 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 7100, loss[loss=0.0491, simple_loss=0.06173, pruned_loss=0.008412, audio_tagging_loss=0.009823, over 14869.00 frames. ], tot_loss[loss=0.06428, simple_loss=0.08752, pruned_loss=0.01178, audio_tagging_loss=0.008747, over 3042696.25 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:30:22,188 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.85 vs. limit=15.0 2023-11-29 00:30:24,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3734673.3333333335, ans=0.125 2023-11-29 00:30:43,667 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.839e+01 8.863e+01 9.578e+01 1.032e+02 1.275e+02, threshold=1.916e+02, percent-clipped=0.0 2023-11-29 00:30:59,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3734873.3333333335, ans=0.2 2023-11-29 00:31:04,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3734940.0, ans=0.0 2023-11-29 00:31:08,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3734940.0, ans=0.125 2023-11-29 00:31:09,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3734940.0, ans=0.1 2023-11-29 00:31:13,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3734940.0, ans=0.2 2023-11-29 00:31:14,680 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 560250 2023-11-29 00:31:16,400 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.77 vs. limit=12.0 2023-11-29 00:31:18,601 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 7150, loss[loss=0.0567, simple_loss=0.07545, pruned_loss=0.009696, audio_tagging_loss=0.00928, over 15950.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.08902, pruned_loss=0.01202, audio_tagging_loss=0.008662, over 3048907.55 frames. ], batch size: 62, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:31:29,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3735073.3333333335, ans=0.0 2023-11-29 00:31:31,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3735073.3333333335, ans=0.2 2023-11-29 00:31:50,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3735140.0, ans=0.125 2023-11-29 00:32:05,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3735206.6666666665, ans=0.2 2023-11-29 00:32:16,474 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 560300 2023-11-29 00:32:17,092 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.57 vs. limit=15.0 2023-11-29 00:32:19,867 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 7200, loss[loss=0.05641, simple_loss=0.07794, pruned_loss=0.008281, audio_tagging_loss=0.009158, over 16289.00 frames. ], tot_loss[loss=0.06505, simple_loss=0.08891, pruned_loss=0.01183, audio_tagging_loss=0.008765, over 3049431.47 frames. ], batch size: 61, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 00:32:32,186 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3735406.6666666665, ans=0.125 2023-11-29 00:32:47,176 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.965e+01 8.965e+01 9.449e+01 1.037e+02 1.518e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-29 00:32:52,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3735473.3333333335, ans=0.125 2023-11-29 00:32:54,332 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.59 vs. limit=22.5 2023-11-29 00:33:02,737 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.23 vs. limit=15.0 2023-11-29 00:33:10,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3735606.6666666665, ans=0.0 2023-11-29 00:33:12,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3735606.6666666665, ans=0.125 2023-11-29 00:33:14,003 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3735606.6666666665, ans=0.2 2023-11-29 00:33:17,242 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 560350 2023-11-29 00:33:20,709 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 7250, loss[loss=0.07101, simple_loss=0.0889, pruned_loss=0.01869, audio_tagging_loss=0.007868, over 14496.00 frames. ], tot_loss[loss=0.06505, simple_loss=0.08912, pruned_loss=0.01172, audio_tagging_loss=0.008773, over 3043323.01 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 00:33:43,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3735740.0, ans=0.125 2023-11-29 00:33:50,250 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3735806.6666666665, ans=0.125 2023-11-29 00:33:59,021 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.03 vs. limit=12.0 2023-11-29 00:34:06,894 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3735873.3333333335, ans=0.035 2023-11-29 00:34:18,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3735940.0, ans=0.125 2023-11-29 00:34:19,949 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 560400 2023-11-29 00:34:23,679 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 7300, loss[loss=0.0643, simple_loss=0.08792, pruned_loss=0.01304, audio_tagging_loss=0.007301, over 15468.00 frames. ], tot_loss[loss=0.06484, simple_loss=0.0888, pruned_loss=0.01176, audio_tagging_loss=0.008685, over 3049787.35 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:34:28,449 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.24 vs. limit=6.0 2023-11-29 00:34:48,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3736140.0, ans=0.2 2023-11-29 00:34:51,208 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.339e+01 8.804e+01 9.526e+01 1.009e+02 1.275e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-29 00:34:52,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3736140.0, ans=0.2 2023-11-29 00:35:01,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3736206.6666666665, ans=0.0 2023-11-29 00:35:21,821 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 560450 2023-11-29 00:35:25,191 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 7350, loss[loss=0.05358, simple_loss=0.07245, pruned_loss=0.01246, audio_tagging_loss=0.004892, over 14912.00 frames. ], tot_loss[loss=0.06479, simple_loss=0.08878, pruned_loss=0.01179, audio_tagging_loss=0.008603, over 3043947.36 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:35:25,525 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3736340.0, ans=0.0 2023-11-29 00:35:30,566 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.42 vs. limit=15.0 2023-11-29 00:35:41,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3736406.6666666665, ans=0.1 2023-11-29 00:35:46,432 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.31 vs. limit=15.0 2023-11-29 00:36:00,919 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3736473.3333333335, ans=0.0 2023-11-29 00:36:02,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3736540.0, ans=0.1 2023-11-29 00:36:15,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3736606.6666666665, ans=0.0 2023-11-29 00:36:20,242 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.07 vs. limit=12.0 2023-11-29 00:36:23,121 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 560500 2023-11-29 00:36:26,689 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 7400, loss[loss=0.06732, simple_loss=0.09229, pruned_loss=0.01239, audio_tagging_loss=0.008787, over 14919.00 frames. ], tot_loss[loss=0.06507, simple_loss=0.08919, pruned_loss=0.01198, audio_tagging_loss=0.008495, over 3044625.09 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:36:28,747 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.70 vs. limit=15.0 2023-11-29 00:36:45,534 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3736740.0, ans=0.125 2023-11-29 00:36:56,338 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.891e+01 9.052e+01 9.894e+01 1.069e+02 1.258e+02, threshold=1.979e+02, percent-clipped=0.0 2023-11-29 00:37:02,825 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.61 vs. limit=22.5 2023-11-29 00:37:03,559 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3736873.3333333335, ans=0.0 2023-11-29 00:37:20,853 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3736940.0, ans=0.2 2023-11-29 00:37:25,121 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 560550 2023-11-29 00:37:29,090 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 7450, loss[loss=0.07955, simple_loss=0.1078, pruned_loss=0.01749, audio_tagging_loss=0.008154, over 16537.00 frames. ], tot_loss[loss=0.0651, simple_loss=0.0892, pruned_loss=0.01201, audio_tagging_loss=0.008484, over 3044868.08 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:37:44,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3737073.3333333335, ans=0.125 2023-11-29 00:37:52,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3737073.3333333335, ans=0.0 2023-11-29 00:37:56,027 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.34 vs. limit=22.5 2023-11-29 00:38:09,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3737206.6666666665, ans=0.1 2023-11-29 00:38:15,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3737206.6666666665, ans=0.0 2023-11-29 00:38:23,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3737273.3333333335, ans=0.125 2023-11-29 00:38:26,393 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 560600 2023-11-29 00:38:30,345 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 7500, loss[loss=0.06503, simple_loss=0.0887, pruned_loss=0.01075, audio_tagging_loss=0.009926, over 14529.00 frames. ], tot_loss[loss=0.06499, simple_loss=0.08917, pruned_loss=0.01194, audio_tagging_loss=0.008468, over 3047191.15 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:38:58,470 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.542e+01 8.888e+01 9.650e+01 1.039e+02 1.258e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-29 00:39:20,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3737606.6666666665, ans=0.1 2023-11-29 00:39:29,096 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 560650 2023-11-29 00:39:32,458 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 7550, loss[loss=0.05217, simple_loss=0.06683, pruned_loss=0.0102, audio_tagging_loss=0.00855, over 15774.00 frames. ], tot_loss[loss=0.06426, simple_loss=0.08785, pruned_loss=0.01181, audio_tagging_loss=0.008523, over 3048757.40 frames. ], batch size: 62, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:39:36,372 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.96 vs. limit=15.0 2023-11-29 00:39:37,359 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3737673.3333333335, ans=0.025 2023-11-29 00:40:00,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff2.min_abs, batch_count=3737806.6666666665, ans=0.1 2023-11-29 00:40:17,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3737873.3333333335, ans=0.125 2023-11-29 00:40:17,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3737873.3333333335, ans=0.0 2023-11-29 00:40:30,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3737940.0, ans=0.125 2023-11-29 00:40:31,000 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 560700 2023-11-29 00:40:34,518 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 7600, loss[loss=0.07678, simple_loss=0.1078, pruned_loss=0.01618, audio_tagging_loss=0.006703, over 15139.00 frames. ], tot_loss[loss=0.06428, simple_loss=0.08766, pruned_loss=0.01185, audio_tagging_loss=0.008598, over 3054965.54 frames. ], batch size: 54, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 00:40:47,830 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.55 vs. limit=5.0 2023-11-29 00:41:02,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3738140.0, ans=0.0 2023-11-29 00:41:03,012 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.868e+01 9.023e+01 9.623e+01 1.078e+02 1.517e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-29 00:41:17,063 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 00:41:21,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3738206.6666666665, ans=0.1 2023-11-29 00:41:21,742 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3738206.6666666665, ans=0.2 2023-11-29 00:41:23,387 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3738273.3333333335, ans=0.2 2023-11-29 00:41:28,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3738273.3333333335, ans=0.0 2023-11-29 00:41:33,078 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 560750 2023-11-29 00:41:37,294 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 7650, loss[loss=0.06335, simple_loss=0.08812, pruned_loss=0.01089, audio_tagging_loss=0.008395, over 14563.00 frames. ], tot_loss[loss=0.06415, simple_loss=0.08774, pruned_loss=0.01175, audio_tagging_loss=0.008526, over 3051776.43 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 00:41:45,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3738340.0, ans=0.0 2023-11-29 00:41:55,186 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 00:42:14,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3738540.0, ans=0.2 2023-11-29 00:42:29,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3738606.6666666665, ans=0.0 2023-11-29 00:42:34,821 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 560800 2023-11-29 00:42:38,566 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 7700, loss[loss=0.05648, simple_loss=0.07645, pruned_loss=0.007927, audio_tagging_loss=0.01033, over 13923.00 frames. ], tot_loss[loss=0.06413, simple_loss=0.08774, pruned_loss=0.01176, audio_tagging_loss=0.008507, over 3055566.06 frames. ], batch size: 54, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:42:52,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3738740.0, ans=0.0 2023-11-29 00:43:02,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3738806.6666666665, ans=0.2 2023-11-29 00:43:08,242 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.715e+01 9.118e+01 9.854e+01 1.042e+02 1.331e+02, threshold=1.971e+02, percent-clipped=0.0 2023-11-29 00:43:18,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3738873.3333333335, ans=0.125 2023-11-29 00:43:34,911 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.65 vs. limit=10.0 2023-11-29 00:43:35,899 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.48 vs. limit=22.5 2023-11-29 00:43:36,758 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 560850 2023-11-29 00:43:40,132 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 7750, loss[loss=0.07694, simple_loss=0.1063, pruned_loss=0.01568, audio_tagging_loss=0.008111, over 15623.00 frames. ], tot_loss[loss=0.06419, simple_loss=0.08785, pruned_loss=0.01178, audio_tagging_loss=0.008482, over 3054676.52 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:43:40,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3739006.6666666665, ans=0.2 2023-11-29 00:43:52,066 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=3739073.3333333335, ans=0.05 2023-11-29 00:43:55,529 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3739073.3333333335, ans=0.125 2023-11-29 00:44:00,795 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 00:44:09,258 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.23 vs. limit=15.0 2023-11-29 00:44:14,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3739140.0, ans=0.125 2023-11-29 00:44:25,123 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.46 vs. limit=22.5 2023-11-29 00:44:26,419 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.27 vs. limit=15.0 2023-11-29 00:44:38,684 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 560900 2023-11-29 00:44:42,123 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 7800, loss[loss=0.06564, simple_loss=0.09195, pruned_loss=0.01191, audio_tagging_loss=0.007755, over 16267.00 frames. ], tot_loss[loss=0.06401, simple_loss=0.08741, pruned_loss=0.01175, audio_tagging_loss=0.008553, over 3052929.71 frames. ], batch size: 61, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:45:11,337 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.680e+01 8.840e+01 9.485e+01 1.019e+02 1.348e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-29 00:45:25,504 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3739540.0, ans=0.125 2023-11-29 00:45:30,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3739540.0, ans=0.125 2023-11-29 00:45:41,189 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 560950 2023-11-29 00:45:44,245 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.73 vs. limit=6.0 2023-11-29 00:45:44,487 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 7850, loss[loss=0.06326, simple_loss=0.07765, pruned_loss=0.01307, audio_tagging_loss=0.01137, over 15312.00 frames. ], tot_loss[loss=0.06471, simple_loss=0.0884, pruned_loss=0.012, audio_tagging_loss=0.008509, over 3046803.24 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:45:47,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3739673.3333333335, ans=0.125 2023-11-29 00:46:00,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3739740.0, ans=0.125 2023-11-29 00:46:13,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3739806.6666666665, ans=0.125 2023-11-29 00:46:14,523 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3739806.6666666665, ans=0.0 2023-11-29 00:46:42,619 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 561000 2023-11-29 00:46:46,333 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 7900, loss[loss=0.06279, simple_loss=0.08671, pruned_loss=0.00918, audio_tagging_loss=0.01026, over 15674.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08889, pruned_loss=0.01221, audio_tagging_loss=0.008636, over 3042475.99 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:46:47,738 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3740006.6666666665, ans=0.1 2023-11-29 00:47:16,126 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.755e+01 9.161e+01 9.815e+01 1.045e+02 1.564e+02, threshold=1.963e+02, percent-clipped=0.0 2023-11-29 00:47:18,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3740140.0, ans=0.125 2023-11-29 00:47:44,242 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 561050 2023-11-29 00:47:45,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3740273.3333333335, ans=0.09899494936611666 2023-11-29 00:47:48,097 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 7950, loss[loss=0.07103, simple_loss=0.1066, pruned_loss=0.01152, audio_tagging_loss=0.006205, over 15552.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.08958, pruned_loss=0.01228, audio_tagging_loss=0.008655, over 3040226.76 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:48:05,054 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 00:48:07,843 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.17 vs. limit=15.0 2023-11-29 00:48:11,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3740473.3333333335, ans=0.1 2023-11-29 00:48:29,033 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.69 vs. limit=15.0 2023-11-29 00:48:45,485 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 561100 2023-11-29 00:48:48,860 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 8000, loss[loss=0.07856, simple_loss=0.1107, pruned_loss=0.01446, audio_tagging_loss=0.008729, over 15024.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08917, pruned_loss=0.01219, audio_tagging_loss=0.00876, over 3033880.31 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 00:49:18,852 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.404e+01 8.940e+01 9.510e+01 1.008e+02 1.278e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-29 00:49:30,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3740873.3333333335, ans=0.1 2023-11-29 00:49:37,309 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.56 vs. limit=15.0 2023-11-29 00:49:46,774 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 561150 2023-11-29 00:49:51,191 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 8050, loss[loss=0.04349, simple_loss=0.05972, pruned_loss=0.004926, audio_tagging_loss=0.008702, over 15198.00 frames. ], tot_loss[loss=0.06504, simple_loss=0.08828, pruned_loss=0.01204, audio_tagging_loss=0.008862, over 3036644.28 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 00:49:51,403 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 00:49:51,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3741006.6666666665, ans=0.0 2023-11-29 00:50:35,575 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.39 vs. limit=6.0 2023-11-29 00:50:48,745 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 561200 2023-11-29 00:50:52,558 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 8100, loss[loss=0.04875, simple_loss=0.06148, pruned_loss=0.009016, audio_tagging_loss=0.008995, over 16176.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08919, pruned_loss=0.01221, audio_tagging_loss=0.008776, over 3043027.57 frames. ], batch size: 64, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:50:56,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3741340.0, ans=0.125 2023-11-29 00:50:57,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3741340.0, ans=0.125 2023-11-29 00:51:04,414 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.06 vs. limit=22.5 2023-11-29 00:51:22,821 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.631e+01 9.038e+01 9.545e+01 1.047e+02 1.336e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-29 00:51:28,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3741540.0, ans=0.5 2023-11-29 00:51:39,157 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 00:51:43,903 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.21 vs. limit=15.0 2023-11-29 00:51:50,163 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 561250 2023-11-29 00:51:53,583 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 8150, loss[loss=0.09203, simple_loss=0.1472, pruned_loss=0.01424, audio_tagging_loss=0.004172, over 16928.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.0906, pruned_loss=0.0124, audio_tagging_loss=0.008607, over 3042525.09 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:52:14,350 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.60 vs. limit=12.0 2023-11-29 00:52:15,467 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.52 vs. limit=15.0 2023-11-29 00:52:27,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3741806.6666666665, ans=0.125 2023-11-29 00:52:44,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3741940.0, ans=0.2 2023-11-29 00:52:46,485 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3741940.0, ans=0.125 2023-11-29 00:52:46,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=3741940.0, ans=10.0 2023-11-29 00:52:51,078 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 561300 2023-11-29 00:52:55,184 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 8200, loss[loss=0.05451, simple_loss=0.0731, pruned_loss=0.007585, audio_tagging_loss=0.01038, over 15555.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.0909, pruned_loss=0.01248, audio_tagging_loss=0.008538, over 3039007.96 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:52:58,181 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 00:53:24,067 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.09 vs. limit=15.0 2023-11-29 00:53:25,824 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.695e+01 9.120e+01 9.757e+01 1.054e+02 1.290e+02, threshold=1.951e+02, percent-clipped=0.0 2023-11-29 00:53:27,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3742140.0, ans=0.0 2023-11-29 00:53:28,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3742140.0, ans=0.125 2023-11-29 00:53:36,683 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3742206.6666666665, ans=0.0 2023-11-29 00:53:36,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3742206.6666666665, ans=0.125 2023-11-29 00:53:53,525 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 561350 2023-11-29 00:53:56,564 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 00:53:56,666 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3742340.0, ans=0.2 2023-11-29 00:53:57,526 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 8250, loss[loss=0.04943, simple_loss=0.0588, pruned_loss=0.006511, audio_tagging_loss=0.01352, over 16162.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.09013, pruned_loss=0.01215, audio_tagging_loss=0.008531, over 3045962.92 frames. ], batch size: 63, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:54:12,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3742406.6666666665, ans=0.1 2023-11-29 00:54:22,863 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3742473.3333333335, ans=10.0 2023-11-29 00:54:33,796 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.34 vs. limit=15.0 2023-11-29 00:54:43,739 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.22 vs. limit=15.0 2023-11-29 00:54:55,609 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 561400 2023-11-29 00:54:59,457 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 8300, loss[loss=0.06046, simple_loss=0.08011, pruned_loss=0.01148, audio_tagging_loss=0.008925, over 15106.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.08982, pruned_loss=0.01216, audio_tagging_loss=0.008542, over 3042486.94 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:54:59,765 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3742673.3333333335, ans=0.125 2023-11-29 00:55:13,958 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.52 vs. limit=22.5 2023-11-29 00:55:29,826 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.656e+01 9.057e+01 9.720e+01 1.032e+02 1.351e+02, threshold=1.944e+02, percent-clipped=0.0 2023-11-29 00:55:32,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3742806.6666666665, ans=0.125 2023-11-29 00:55:48,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3742940.0, ans=0.125 2023-11-29 00:55:56,150 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 561450 2023-11-29 00:55:57,533 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3742940.0, ans=0.05 2023-11-29 00:55:59,628 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 8350, loss[loss=0.04417, simple_loss=0.06033, pruned_loss=0.004452, audio_tagging_loss=0.009548, over 14137.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08972, pruned_loss=0.01185, audio_tagging_loss=0.008466, over 3039425.02 frames. ], batch size: 54, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:56:04,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3743006.6666666665, ans=0.0 2023-11-29 00:56:09,498 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.42 vs. limit=6.0 2023-11-29 00:56:15,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3743073.3333333335, ans=0.2 2023-11-29 00:56:35,828 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3743206.6666666665, ans=0.125 2023-11-29 00:56:45,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3743206.6666666665, ans=0.125 2023-11-29 00:56:55,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3743273.3333333335, ans=0.125 2023-11-29 00:56:58,013 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 561500 2023-11-29 00:57:01,974 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 8400, loss[loss=0.07542, simple_loss=0.1019, pruned_loss=0.01716, audio_tagging_loss=0.007302, over 15303.00 frames. ], tot_loss[loss=0.06456, simple_loss=0.08866, pruned_loss=0.0117, audio_tagging_loss=0.008526, over 3033254.59 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 00:57:15,912 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3743406.6666666665, ans=0.125 2023-11-29 00:57:31,264 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.10 vs. limit=12.0 2023-11-29 00:57:31,751 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.582e+01 8.851e+01 9.361e+01 9.943e+01 1.259e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-29 00:57:52,670 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3743606.6666666665, ans=0.125 2023-11-29 00:57:52,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3743606.6666666665, ans=0.125 2023-11-29 00:58:00,078 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 561550 2023-11-29 00:58:00,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3743606.6666666665, ans=0.0 2023-11-29 00:58:03,561 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 8450, loss[loss=0.07645, simple_loss=0.1042, pruned_loss=0.01697, audio_tagging_loss=0.007396, over 14969.00 frames. ], tot_loss[loss=0.06431, simple_loss=0.08828, pruned_loss=0.01167, audio_tagging_loss=0.0085, over 3034122.82 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 00:58:06,172 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3743673.3333333335, ans=0.125 2023-11-29 00:58:59,243 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.21 vs. limit=15.0 2023-11-29 00:59:01,230 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 561600 2023-11-29 00:59:04,920 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 8500, loss[loss=0.07307, simple_loss=0.1091, pruned_loss=0.01397, audio_tagging_loss=0.004569, over 15737.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.08939, pruned_loss=0.01196, audio_tagging_loss=0.008556, over 3039663.92 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:59:07,740 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.59 vs. limit=15.0 2023-11-29 00:59:25,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3744073.3333333335, ans=10.0 2023-11-29 00:59:25,599 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.71 vs. limit=15.0 2023-11-29 00:59:38,005 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.884e+01 8.998e+01 9.683e+01 1.039e+02 1.237e+02, threshold=1.937e+02, percent-clipped=0.0 2023-11-29 00:59:54,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3744273.3333333335, ans=0.1 2023-11-29 01:00:03,098 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 561650 2023-11-29 01:00:06,550 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 8550, loss[loss=0.06857, simple_loss=0.09703, pruned_loss=0.01168, audio_tagging_loss=0.008368, over 16988.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.09036, pruned_loss=0.01214, audio_tagging_loss=0.008478, over 3046299.54 frames. ], batch size: 64, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:00:12,815 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.58 vs. limit=22.5 2023-11-29 01:00:25,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3744406.6666666665, ans=0.0 2023-11-29 01:00:43,579 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3744540.0, ans=0.125 2023-11-29 01:00:43,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3744540.0, ans=0.1 2023-11-29 01:01:05,781 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 561700 2023-11-29 01:01:09,152 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 8600, loss[loss=0.06376, simple_loss=0.07966, pruned_loss=0.01399, audio_tagging_loss=0.009941, over 15400.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.09046, pruned_loss=0.01233, audio_tagging_loss=0.00852, over 3048151.95 frames. ], batch size: 60, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:01:11,067 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.94 vs. limit=22.5 2023-11-29 01:01:28,813 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.55 vs. limit=15.0 2023-11-29 01:01:30,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3744740.0, ans=0.125 2023-11-29 01:01:40,220 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.679e+01 8.937e+01 9.624e+01 1.044e+02 4.545e+02, threshold=1.925e+02, percent-clipped=1.0 2023-11-29 01:01:40,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3744806.6666666665, ans=0.125 2023-11-29 01:02:00,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3744940.0, ans=0.125 2023-11-29 01:02:06,593 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 561750 2023-11-29 01:02:10,006 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 8650, loss[loss=0.08209, simple_loss=0.1195, pruned_loss=0.01624, audio_tagging_loss=0.006095, over 15431.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.09032, pruned_loss=0.01238, audio_tagging_loss=0.008583, over 3047741.31 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:02:25,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3745073.3333333335, ans=0.1 2023-11-29 01:02:46,695 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.59 vs. limit=12.0 2023-11-29 01:02:51,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3745206.6666666665, ans=0.125 2023-11-29 01:03:07,239 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 561800 2023-11-29 01:03:11,061 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 8700, loss[loss=0.06491, simple_loss=0.1085, pruned_loss=0.006461, audio_tagging_loss=0.004199, over 15235.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.09095, pruned_loss=0.01231, audio_tagging_loss=0.008627, over 3050617.91 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:03:20,365 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3745340.0, ans=0.125 2023-11-29 01:03:27,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3745406.6666666665, ans=0.125 2023-11-29 01:03:32,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3745406.6666666665, ans=0.0 2023-11-29 01:03:44,733 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.872e+01 8.978e+01 9.587e+01 1.045e+02 1.358e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-29 01:03:54,750 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.08 vs. limit=15.0 2023-11-29 01:03:56,604 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 01:03:57,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3745540.0, ans=0.2 2023-11-29 01:03:57,815 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3745540.0, ans=0.0 2023-11-29 01:04:10,128 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 561850 2023-11-29 01:04:13,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3745673.3333333335, ans=0.2 2023-11-29 01:04:14,709 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 8750, loss[loss=0.07767, simple_loss=0.1141, pruned_loss=0.01588, audio_tagging_loss=0.004761, over 14879.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.09087, pruned_loss=0.01219, audio_tagging_loss=0.008648, over 3047818.46 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 8.0 2023-11-29 01:04:35,610 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3745740.0, ans=0.125 2023-11-29 01:04:42,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3745806.6666666665, ans=0.0 2023-11-29 01:04:44,574 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 01:04:50,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3745873.3333333335, ans=0.125 2023-11-29 01:04:50,827 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.04 vs. limit=15.0 2023-11-29 01:05:11,946 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 561900 2023-11-29 01:05:15,326 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 8800, loss[loss=0.05552, simple_loss=0.07667, pruned_loss=0.008322, audio_tagging_loss=0.008863, over 16699.00 frames. ], tot_loss[loss=0.06678, simple_loss=0.09141, pruned_loss=0.01232, audio_tagging_loss=0.008753, over 3048247.61 frames. ], batch size: 61, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:05:49,570 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.820e+01 9.250e+01 9.769e+01 1.064e+02 1.773e+02, threshold=1.954e+02, percent-clipped=0.0 2023-11-29 01:05:53,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3746206.6666666665, ans=0.2 2023-11-29 01:05:56,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3746206.6666666665, ans=0.125 2023-11-29 01:06:12,835 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 561950 2023-11-29 01:06:16,837 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 8850, loss[loss=0.07301, simple_loss=0.09196, pruned_loss=0.01544, audio_tagging_loss=0.01158, over 15533.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.09005, pruned_loss=0.012, audio_tagging_loss=0.008793, over 3042943.47 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:06:27,598 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.78 vs. limit=22.5 2023-11-29 01:06:30,491 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 01:06:55,353 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3746540.0, ans=0.07 2023-11-29 01:06:58,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3746540.0, ans=0.0 2023-11-29 01:07:14,940 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 562000 2023-11-29 01:07:19,284 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 8900, loss[loss=0.05432, simple_loss=0.06577, pruned_loss=0.01192, audio_tagging_loss=0.009509, over 15599.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08993, pruned_loss=0.012, audio_tagging_loss=0.008622, over 3046019.64 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:07:20,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3746673.3333333335, ans=0.125 2023-11-29 01:07:21,502 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.55 vs. limit=12.0 2023-11-29 01:07:25,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3746673.3333333335, ans=0.0 2023-11-29 01:07:31,917 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3746740.0, ans=0.1 2023-11-29 01:07:43,703 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3746806.6666666665, ans=0.0 2023-11-29 01:07:51,316 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3746806.6666666665, ans=0.07 2023-11-29 01:07:51,330 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3746806.6666666665, ans=0.07 2023-11-29 01:07:52,103 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.613e+01 8.904e+01 9.438e+01 1.016e+02 1.537e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-29 01:08:05,184 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.00 vs. limit=12.0 2023-11-29 01:08:06,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3746873.3333333335, ans=0.125 2023-11-29 01:08:11,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3746940.0, ans=0.2 2023-11-29 01:08:16,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3746940.0, ans=0.04949747468305833 2023-11-29 01:08:17,365 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 562050 2023-11-29 01:08:20,644 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 8950, loss[loss=0.06336, simple_loss=0.07711, pruned_loss=0.01388, audio_tagging_loss=0.01092, over 14377.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.09041, pruned_loss=0.01213, audio_tagging_loss=0.008502, over 3053926.80 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:08:25,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3747006.6666666665, ans=0.0 2023-11-29 01:08:42,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3747073.3333333335, ans=0.0 2023-11-29 01:08:58,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3747206.6666666665, ans=0.1 2023-11-29 01:09:06,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3747206.6666666665, ans=0.125 2023-11-29 01:09:16,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3747273.3333333335, ans=0.125 2023-11-29 01:09:17,843 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 562100 2023-11-29 01:09:21,838 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 9000, loss[loss=0.06558, simple_loss=0.09093, pruned_loss=0.01261, audio_tagging_loss=0.007502, over 15789.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.0904, pruned_loss=0.01217, audio_tagging_loss=0.008364, over 3052990.57 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:09:21,839 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-29 01:10:02,094 INFO [train_asr.py:1267] (1/4) Epoch 47, validation: loss=0.05855, simple_loss=0.05046, pruned_loss=0.005347, audio_tagging_loss=0.02798, over 4681554.00 frames. 2023-11-29 01:10:02,095 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-29 01:10:07,016 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 01:10:15,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3747406.6666666665, ans=0.125 2023-11-29 01:10:34,518 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.174e+01 8.906e+01 9.620e+01 1.037e+02 1.250e+02, threshold=1.924e+02, percent-clipped=0.0 2023-11-29 01:10:34,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3747473.3333333335, ans=0.0 2023-11-29 01:10:55,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3747606.6666666665, ans=0.125 2023-11-29 01:10:59,252 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 562150 2023-11-29 01:10:59,779 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.05 vs. limit=22.5 2023-11-29 01:11:03,483 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 9050, loss[loss=0.06444, simple_loss=0.08145, pruned_loss=0.01232, audio_tagging_loss=0.0114, over 14984.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08999, pruned_loss=0.01199, audio_tagging_loss=0.008318, over 3060362.53 frames. ], batch size: 60, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:11:24,234 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3747740.0, ans=0.125 2023-11-29 01:11:58,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3747940.0, ans=0.125 2023-11-29 01:12:01,683 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 562200 2023-11-29 01:12:05,354 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 9100, loss[loss=0.06616, simple_loss=0.09124, pruned_loss=0.01212, audio_tagging_loss=0.008419, over 16268.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.08939, pruned_loss=0.0119, audio_tagging_loss=0.008317, over 3061759.46 frames. ], batch size: 61, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:12:05,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3748006.6666666665, ans=0.0 2023-11-29 01:12:09,711 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.73 vs. limit=22.5 2023-11-29 01:12:27,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3748073.3333333335, ans=0.2 2023-11-29 01:12:38,337 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.957e+01 8.933e+01 9.563e+01 1.020e+02 1.667e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-29 01:12:38,660 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3748140.0, ans=0.2 2023-11-29 01:12:48,004 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.62 vs. limit=22.5 2023-11-29 01:12:52,306 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 01:13:03,042 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 562250 2023-11-29 01:13:06,526 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 9150, loss[loss=0.05611, simple_loss=0.06575, pruned_loss=0.01115, audio_tagging_loss=0.01208, over 15546.00 frames. ], tot_loss[loss=0.06499, simple_loss=0.08938, pruned_loss=0.01196, audio_tagging_loss=0.008339, over 3051453.65 frames. ], batch size: 60, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:13:20,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3748406.6666666665, ans=0.0 2023-11-29 01:13:47,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3748540.0, ans=0.0 2023-11-29 01:13:47,991 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.19 vs. limit=15.0 2023-11-29 01:13:52,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3748540.0, ans=0.2 2023-11-29 01:13:55,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3748606.6666666665, ans=0.125 2023-11-29 01:14:04,790 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 562300 2023-11-29 01:14:08,053 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 9200, loss[loss=0.08147, simple_loss=0.117, pruned_loss=0.01699, audio_tagging_loss=0.005998, over 16351.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.08971, pruned_loss=0.01211, audio_tagging_loss=0.008366, over 3051679.54 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:14:19,485 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 01:14:24,676 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3748740.0, ans=0.125 2023-11-29 01:14:41,268 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.717e+01 9.164e+01 9.710e+01 1.033e+02 1.295e+02, threshold=1.942e+02, percent-clipped=0.0 2023-11-29 01:14:47,028 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3748873.3333333335, ans=0.125 2023-11-29 01:14:55,596 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.62 vs. limit=22.5 2023-11-29 01:15:06,336 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 562350 2023-11-29 01:15:10,325 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 9250, loss[loss=0.05864, simple_loss=0.07889, pruned_loss=0.009583, audio_tagging_loss=0.009612, over 15137.00 frames. ], tot_loss[loss=0.06468, simple_loss=0.08858, pruned_loss=0.01198, audio_tagging_loss=0.008406, over 3056366.67 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:15:11,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3749006.6666666665, ans=0.09899494936611666 2023-11-29 01:15:15,707 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.52 vs. limit=6.0 2023-11-29 01:15:26,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3749073.3333333335, ans=0.07 2023-11-29 01:15:28,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3749073.3333333335, ans=0.0 2023-11-29 01:16:07,661 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 562400 2023-11-29 01:16:11,605 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 9300, loss[loss=0.05448, simple_loss=0.08032, pruned_loss=0.006289, audio_tagging_loss=0.008027, over 15017.00 frames. ], tot_loss[loss=0.0646, simple_loss=0.08868, pruned_loss=0.01187, audio_tagging_loss=0.008394, over 3054601.76 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:16:45,424 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.732e+01 9.046e+01 9.645e+01 1.038e+02 1.624e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-29 01:16:56,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3749540.0, ans=0.1 2023-11-29 01:17:09,977 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 562450 2023-11-29 01:17:11,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3749606.6666666665, ans=0.2 2023-11-29 01:17:11,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3749606.6666666665, ans=0.125 2023-11-29 01:17:13,319 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 9350, loss[loss=0.06734, simple_loss=0.09838, pruned_loss=0.01209, audio_tagging_loss=0.006065, over 16214.00 frames. ], tot_loss[loss=0.06499, simple_loss=0.08954, pruned_loss=0.01189, audio_tagging_loss=0.008337, over 3052228.42 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:17:36,021 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.84 vs. limit=10.0 2023-11-29 01:17:37,520 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.48 vs. limit=15.0 2023-11-29 01:17:49,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3749873.3333333335, ans=0.0 2023-11-29 01:17:51,206 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.16 vs. limit=22.5 2023-11-29 01:17:59,422 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.66 vs. limit=6.0 2023-11-29 01:18:06,483 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.61 vs. limit=15.0 2023-11-29 01:18:10,497 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 562500 2023-11-29 01:18:10,727 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=3749940.0, ans=10.0 2023-11-29 01:18:15,242 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 9400, loss[loss=0.0648, simple_loss=0.08848, pruned_loss=0.01234, audio_tagging_loss=0.00823, over 15590.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08972, pruned_loss=0.01204, audio_tagging_loss=0.00845, over 3046270.44 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:18:20,698 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.25 vs. limit=15.0 2023-11-29 01:18:38,217 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.93 vs. limit=15.0 2023-11-29 01:18:48,148 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.900e+01 9.214e+01 9.788e+01 1.040e+02 1.202e+02, threshold=1.958e+02, percent-clipped=0.0 2023-11-29 01:19:01,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3750206.6666666665, ans=0.125 2023-11-29 01:19:12,663 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 562550 2023-11-29 01:19:16,057 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 9450, loss[loss=0.07745, simple_loss=0.1096, pruned_loss=0.01591, audio_tagging_loss=0.006731, over 15152.00 frames. ], tot_loss[loss=0.06502, simple_loss=0.08922, pruned_loss=0.01179, audio_tagging_loss=0.008619, over 3050801.50 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:19:17,776 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 01:19:59,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3750540.0, ans=0.125 2023-11-29 01:20:02,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3750540.0, ans=0.2 2023-11-29 01:20:11,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3750606.6666666665, ans=0.1 2023-11-29 01:20:15,355 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 562600 2023-11-29 01:20:19,015 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 9500, loss[loss=0.05029, simple_loss=0.0706, pruned_loss=0.007523, audio_tagging_loss=0.007469, over 12972.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.0891, pruned_loss=0.01184, audio_tagging_loss=0.00869, over 3054891.95 frames. ], batch size: 52, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:20:42,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3750806.6666666665, ans=0.125 2023-11-29 01:20:45,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3750806.6666666665, ans=0.125 2023-11-29 01:20:45,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3750806.6666666665, ans=0.125 2023-11-29 01:20:47,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3750806.6666666665, ans=0.125 2023-11-29 01:20:53,733 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.404e+01 8.884e+01 9.617e+01 1.043e+02 1.271e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-29 01:21:06,523 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3750873.3333333335, ans=0.125 2023-11-29 01:21:17,087 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 562650 2023-11-29 01:21:20,501 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 9550, loss[loss=0.05421, simple_loss=0.0716, pruned_loss=0.008422, audio_tagging_loss=0.00999, over 15348.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.08944, pruned_loss=0.01195, audio_tagging_loss=0.008734, over 3049781.88 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:21:30,589 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.05 vs. limit=22.5 2023-11-29 01:22:06,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3751206.6666666665, ans=0.1 2023-11-29 01:22:10,209 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.16 vs. limit=15.0 2023-11-29 01:22:19,449 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 562700 2023-11-29 01:22:22,940 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 9600, loss[loss=0.06512, simple_loss=0.08903, pruned_loss=0.01253, audio_tagging_loss=0.008073, over 15168.00 frames. ], tot_loss[loss=0.06495, simple_loss=0.08881, pruned_loss=0.01178, audio_tagging_loss=0.008764, over 3050711.43 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:22:57,406 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.720e+01 8.993e+01 9.667e+01 1.037e+02 1.328e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-29 01:23:02,592 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3751540.0, ans=0.025 2023-11-29 01:23:20,863 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3751606.6666666665, ans=0.0 2023-11-29 01:23:21,968 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 562750 2023-11-29 01:23:23,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3751606.6666666665, ans=0.1 2023-11-29 01:23:25,377 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 9650, loss[loss=0.06736, simple_loss=0.09542, pruned_loss=0.01021, audio_tagging_loss=0.009437, over 16175.00 frames. ], tot_loss[loss=0.06489, simple_loss=0.08866, pruned_loss=0.01181, audio_tagging_loss=0.008747, over 3045420.24 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:23:32,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3751673.3333333335, ans=10.0 2023-11-29 01:23:35,144 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3751673.3333333335, ans=0.0 2023-11-29 01:23:45,007 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3751740.0, ans=0.0 2023-11-29 01:23:51,005 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.63 vs. limit=12.0 2023-11-29 01:24:15,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3751940.0, ans=0.0 2023-11-29 01:24:22,989 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 562800 2023-11-29 01:24:26,724 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 9700, loss[loss=0.04961, simple_loss=0.06347, pruned_loss=0.007504, audio_tagging_loss=0.01037, over 14845.00 frames. ], tot_loss[loss=0.06461, simple_loss=0.08813, pruned_loss=0.01188, audio_tagging_loss=0.008664, over 3046632.01 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:24:30,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3752006.6666666665, ans=0.07 2023-11-29 01:24:46,221 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.97 vs. limit=15.0 2023-11-29 01:25:01,796 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.640e+01 9.050e+01 9.542e+01 1.032e+02 1.533e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-29 01:25:17,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3752273.3333333335, ans=0.1 2023-11-29 01:25:24,955 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 562850 2023-11-29 01:25:28,336 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 9750, loss[loss=0.0792, simple_loss=0.1143, pruned_loss=0.01489, audio_tagging_loss=0.007151, over 15203.00 frames. ], tot_loss[loss=0.06456, simple_loss=0.08822, pruned_loss=0.01188, audio_tagging_loss=0.008566, over 3045484.96 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:25:39,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3752340.0, ans=0.0 2023-11-29 01:25:44,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3752406.6666666665, ans=0.125 2023-11-29 01:25:54,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3752473.3333333335, ans=0.1 2023-11-29 01:25:54,418 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.53 vs. limit=15.0 2023-11-29 01:26:19,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3752606.6666666665, ans=0.07 2023-11-29 01:26:20,101 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3752606.6666666665, ans=0.1 2023-11-29 01:26:25,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3752606.6666666665, ans=0.2 2023-11-29 01:26:25,684 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.70 vs. limit=10.0 2023-11-29 01:26:28,004 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 562900 2023-11-29 01:26:29,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3752606.6666666665, ans=0.0 2023-11-29 01:26:31,452 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 9800, loss[loss=0.04412, simple_loss=0.05041, pruned_loss=0.008503, audio_tagging_loss=0.01041, over 15053.00 frames. ], tot_loss[loss=0.06439, simple_loss=0.08812, pruned_loss=0.01184, audio_tagging_loss=0.008486, over 3048600.35 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:26:45,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3752740.0, ans=0.0 2023-11-29 01:26:50,676 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3752740.0, ans=0.0 2023-11-29 01:26:59,218 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3752806.6666666665, ans=0.125 2023-11-29 01:27:01,499 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3752806.6666666665, ans=0.2 2023-11-29 01:27:02,013 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.96 vs. limit=12.0 2023-11-29 01:27:04,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3752806.6666666665, ans=0.0 2023-11-29 01:27:04,753 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.514e+01 8.999e+01 9.540e+01 1.035e+02 1.290e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-29 01:27:27,951 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 01:27:28,025 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 562950 2023-11-29 01:27:31,233 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 9850, loss[loss=0.06189, simple_loss=0.09126, pruned_loss=0.007294, audio_tagging_loss=0.008965, over 15523.00 frames. ], tot_loss[loss=0.06473, simple_loss=0.08895, pruned_loss=0.01182, audio_tagging_loss=0.008436, over 3040723.33 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:28:00,010 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.57 vs. limit=15.0 2023-11-29 01:28:11,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3753206.6666666665, ans=0.125 2023-11-29 01:28:11,259 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3753206.6666666665, ans=0.125 2023-11-29 01:28:22,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3753273.3333333335, ans=0.125 2023-11-29 01:28:24,391 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3753273.3333333335, ans=0.2 2023-11-29 01:28:29,714 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 563000 2023-11-29 01:28:29,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3753273.3333333335, ans=0.125 2023-11-29 01:28:33,645 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 9900, loss[loss=0.06217, simple_loss=0.07807, pruned_loss=0.01236, audio_tagging_loss=0.01078, over 14932.00 frames. ], tot_loss[loss=0.06457, simple_loss=0.08856, pruned_loss=0.01189, audio_tagging_loss=0.0084, over 3036462.77 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:29:09,359 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.807e+01 8.970e+01 9.713e+01 1.037e+02 1.358e+02, threshold=1.943e+02, percent-clipped=0.0 2023-11-29 01:29:12,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3753540.0, ans=0.2 2023-11-29 01:29:12,085 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3753540.0, ans=0.0 2023-11-29 01:29:31,834 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 563050 2023-11-29 01:29:35,965 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 9950, loss[loss=0.04662, simple_loss=0.06596, pruned_loss=0.003732, audio_tagging_loss=0.009914, over 15237.00 frames. ], tot_loss[loss=0.06397, simple_loss=0.08779, pruned_loss=0.01169, audio_tagging_loss=0.008391, over 3039733.53 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:29:41,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3753673.3333333335, ans=0.125 2023-11-29 01:29:49,668 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.01 vs. limit=22.5 2023-11-29 01:30:06,051 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3753806.6666666665, ans=0.1 2023-11-29 01:30:06,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3753806.6666666665, ans=0.0 2023-11-29 01:30:06,579 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.42 vs. limit=15.0 2023-11-29 01:30:12,270 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 01:30:17,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3753873.3333333335, ans=0.125 2023-11-29 01:30:17,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3753873.3333333335, ans=0.125 2023-11-29 01:30:24,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3753940.0, ans=0.0 2023-11-29 01:30:33,913 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 563100 2023-11-29 01:30:37,324 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 10000, loss[loss=0.06988, simple_loss=0.09738, pruned_loss=0.01413, audio_tagging_loss=0.007062, over 16152.00 frames. ], tot_loss[loss=0.06406, simple_loss=0.08807, pruned_loss=0.01167, audio_tagging_loss=0.008353, over 3047933.15 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:30:49,546 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.16 vs. limit=15.0 2023-11-29 01:31:02,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3754140.0, ans=0.0 2023-11-29 01:31:13,151 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3754140.0, ans=0.09899494936611666 2023-11-29 01:31:13,961 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.380e+01 9.037e+01 9.619e+01 1.035e+02 1.339e+02, threshold=1.924e+02, percent-clipped=0.0 2023-11-29 01:31:23,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3754206.6666666665, ans=0.125 2023-11-29 01:31:30,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3754273.3333333335, ans=0.0 2023-11-29 01:31:31,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=3754273.3333333335, ans=22.5 2023-11-29 01:31:33,568 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.35 vs. limit=15.0 2023-11-29 01:31:34,628 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.85 vs. limit=22.5 2023-11-29 01:31:35,960 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 563150 2023-11-29 01:31:36,416 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.76 vs. limit=22.5 2023-11-29 01:31:39,225 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 10050, loss[loss=0.07365, simple_loss=0.109, pruned_loss=0.01204, audio_tagging_loss=0.007096, over 15931.00 frames. ], tot_loss[loss=0.06396, simple_loss=0.08782, pruned_loss=0.01168, audio_tagging_loss=0.008368, over 3050519.85 frames. ], batch size: 60, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:32:04,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3754473.3333333335, ans=0.125 2023-11-29 01:32:09,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3754473.3333333335, ans=0.125 2023-11-29 01:32:15,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3754540.0, ans=0.125 2023-11-29 01:32:27,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3754606.6666666665, ans=0.125 2023-11-29 01:32:34,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3754606.6666666665, ans=0.125 2023-11-29 01:32:37,247 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 563200 2023-11-29 01:32:37,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3754606.6666666665, ans=0.125 2023-11-29 01:32:41,636 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 10100, loss[loss=0.07448, simple_loss=0.1013, pruned_loss=0.01686, audio_tagging_loss=0.006983, over 15788.00 frames. ], tot_loss[loss=0.06343, simple_loss=0.08673, pruned_loss=0.01153, audio_tagging_loss=0.008538, over 3045850.75 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:32:54,930 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.14 vs. limit=15.0 2023-11-29 01:32:59,480 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.17 vs. limit=15.0 2023-11-29 01:33:11,441 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.55 vs. limit=15.0 2023-11-29 01:33:17,894 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.573e+01 9.162e+01 9.791e+01 1.075e+02 1.682e+02, threshold=1.958e+02, percent-clipped=0.0 2023-11-29 01:33:24,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3754873.3333333335, ans=0.0 2023-11-29 01:33:33,293 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 01:33:39,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3754940.0, ans=0.125 2023-11-29 01:33:39,922 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 563250 2023-11-29 01:33:43,342 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 10150, loss[loss=0.06119, simple_loss=0.07843, pruned_loss=0.01218, audio_tagging_loss=0.009797, over 14119.00 frames. ], tot_loss[loss=0.06376, simple_loss=0.08716, pruned_loss=0.01155, audio_tagging_loss=0.008632, over 3048078.25 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:33:57,073 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3755073.3333333335, ans=0.125 2023-11-29 01:34:12,594 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3755140.0, ans=0.0 2023-11-29 01:34:13,513 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 01:34:23,920 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.13 vs. limit=6.0 2023-11-29 01:34:36,186 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3755273.3333333335, ans=0.0 2023-11-29 01:34:38,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3755273.3333333335, ans=0.0 2023-11-29 01:34:40,822 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 563300 2023-11-29 01:34:44,796 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 10200, loss[loss=0.05471, simple_loss=0.07404, pruned_loss=0.008487, audio_tagging_loss=0.009198, over 15815.00 frames. ], tot_loss[loss=0.06343, simple_loss=0.08665, pruned_loss=0.01145, audio_tagging_loss=0.008659, over 3050213.00 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:34:45,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3755340.0, ans=0.04949747468305833 2023-11-29 01:34:51,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3755340.0, ans=0.05 2023-11-29 01:35:01,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3755406.6666666665, ans=0.125 2023-11-29 01:35:01,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3755406.6666666665, ans=0.125 2023-11-29 01:35:06,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3755406.6666666665, ans=0.125 2023-11-29 01:35:07,581 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3755406.6666666665, ans=0.1 2023-11-29 01:35:09,618 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 01:35:12,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3755473.3333333335, ans=0.125 2023-11-29 01:35:21,768 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.706e+01 9.157e+01 9.642e+01 1.029e+02 1.501e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-29 01:35:22,107 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3755540.0, ans=0.07 2023-11-29 01:35:26,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3755540.0, ans=0.125 2023-11-29 01:35:26,940 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.33 vs. limit=10.0 2023-11-29 01:35:42,288 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 563350 2023-11-29 01:35:46,287 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 10250, loss[loss=0.07553, simple_loss=0.09739, pruned_loss=0.01761, audio_tagging_loss=0.009225, over 14198.00 frames. ], tot_loss[loss=0.06471, simple_loss=0.08822, pruned_loss=0.0119, audio_tagging_loss=0.008704, over 3056245.87 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:35:46,665 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3755673.3333333335, ans=0.125 2023-11-29 01:35:49,610 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3755673.3333333335, ans=0.125 2023-11-29 01:35:55,626 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.17 vs. limit=6.0 2023-11-29 01:36:26,146 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.20 vs. limit=15.0 2023-11-29 01:36:43,874 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 563400 2023-11-29 01:36:47,557 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 10300, loss[loss=0.07679, simple_loss=0.09052, pruned_loss=0.0176, audio_tagging_loss=0.01392, over 16324.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08924, pruned_loss=0.01204, audio_tagging_loss=0.008714, over 3056604.25 frames. ], batch size: 63, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:37:06,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3756073.3333333335, ans=0.0 2023-11-29 01:37:13,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3756140.0, ans=0.1 2023-11-29 01:37:24,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3756206.6666666665, ans=0.125 2023-11-29 01:37:25,111 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.270e+01 9.072e+01 9.694e+01 1.048e+02 1.558e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-29 01:37:28,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3756206.6666666665, ans=0.0 2023-11-29 01:37:40,930 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.78 vs. limit=15.0 2023-11-29 01:37:41,602 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3756273.3333333335, ans=0.1 2023-11-29 01:37:41,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3756273.3333333335, ans=0.1 2023-11-29 01:37:46,241 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 563450 2023-11-29 01:37:49,665 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 10350, loss[loss=0.08095, simple_loss=0.1041, pruned_loss=0.01734, audio_tagging_loss=0.01158, over 15667.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08927, pruned_loss=0.01204, audio_tagging_loss=0.008817, over 3059711.70 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:37:54,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3756340.0, ans=0.1 2023-11-29 01:38:01,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3756406.6666666665, ans=0.125 2023-11-29 01:38:03,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3756406.6666666665, ans=0.125 2023-11-29 01:38:07,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3756406.6666666665, ans=0.09899494936611666 2023-11-29 01:38:12,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3756406.6666666665, ans=0.2 2023-11-29 01:38:14,798 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.28 vs. limit=22.5 2023-11-29 01:38:31,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3756540.0, ans=0.0 2023-11-29 01:38:47,967 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 563500 2023-11-29 01:38:51,254 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 10400, loss[loss=0.08558, simple_loss=0.1221, pruned_loss=0.01671, audio_tagging_loss=0.007817, over 16316.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08886, pruned_loss=0.01196, audio_tagging_loss=0.009079, over 3061614.89 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:38:58,529 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3756673.3333333335, ans=0.1 2023-11-29 01:39:01,846 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3756673.3333333335, ans=0.125 2023-11-29 01:39:04,666 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.37 vs. limit=15.0 2023-11-29 01:39:11,499 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.08 vs. limit=15.0 2023-11-29 01:39:25,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3756806.6666666665, ans=0.1 2023-11-29 01:39:28,497 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.657e+01 9.225e+01 9.627e+01 1.037e+02 1.431e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-29 01:39:45,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3756940.0, ans=0.125 2023-11-29 01:39:45,796 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.07 vs. limit=22.5 2023-11-29 01:39:49,680 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 563550 2023-11-29 01:39:53,060 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 10450, loss[loss=0.04491, simple_loss=0.05956, pruned_loss=0.006531, audio_tagging_loss=0.008602, over 17016.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.08962, pruned_loss=0.01198, audio_tagging_loss=0.008951, over 3061844.02 frames. ], batch size: 66, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:40:10,010 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3757073.3333333335, ans=0.0 2023-11-29 01:40:32,789 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3757206.6666666665, ans=0.125 2023-11-29 01:40:49,991 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 563600 2023-11-29 01:40:54,487 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 10500, loss[loss=0.05913, simple_loss=0.08203, pruned_loss=0.01051, audio_tagging_loss=0.007608, over 14461.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08944, pruned_loss=0.01206, audio_tagging_loss=0.008817, over 3056455.71 frames. ], batch size: 53, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:41:16,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3757406.6666666665, ans=0.125 2023-11-29 01:41:16,399 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.34 vs. limit=22.5 2023-11-29 01:41:21,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3757473.3333333335, ans=0.1 2023-11-29 01:41:28,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3757473.3333333335, ans=0.1 2023-11-29 01:41:28,290 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3757473.3333333335, ans=0.0 2023-11-29 01:41:31,334 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.549e+01 8.902e+01 9.602e+01 1.050e+02 1.360e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-29 01:41:46,421 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.99 vs. limit=15.0 2023-11-29 01:41:52,591 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 563650 2023-11-29 01:41:52,837 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3757606.6666666665, ans=0.0 2023-11-29 01:41:55,920 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 10550, loss[loss=0.07114, simple_loss=0.1011, pruned_loss=0.01298, audio_tagging_loss=0.007621, over 15320.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.08974, pruned_loss=0.01212, audio_tagging_loss=0.008673, over 3053751.80 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:42:00,061 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.94 vs. limit=12.0 2023-11-29 01:42:15,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3757740.0, ans=0.125 2023-11-29 01:42:16,298 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.12 vs. limit=12.0 2023-11-29 01:42:26,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3757806.6666666665, ans=0.0 2023-11-29 01:42:33,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3757873.3333333335, ans=0.0 2023-11-29 01:42:35,758 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3757873.3333333335, ans=0.125 2023-11-29 01:42:44,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3757940.0, ans=0.0 2023-11-29 01:42:52,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3757940.0, ans=0.1 2023-11-29 01:42:54,233 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 563700 2023-11-29 01:42:57,615 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 10600, loss[loss=0.07324, simple_loss=0.09692, pruned_loss=0.01542, audio_tagging_loss=0.009364, over 15539.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.09051, pruned_loss=0.01231, audio_tagging_loss=0.008539, over 3045688.57 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:42:58,026 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.514e-03 2023-11-29 01:43:02,815 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.79 vs. limit=10.0 2023-11-29 01:43:03,899 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.05 vs. limit=15.0 2023-11-29 01:43:22,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3758140.0, ans=0.125 2023-11-29 01:43:34,210 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.594e+01 9.127e+01 9.716e+01 1.043e+02 1.257e+02, threshold=1.943e+02, percent-clipped=0.0 2023-11-29 01:43:43,621 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.23 vs. limit=15.0 2023-11-29 01:43:43,797 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.15 vs. limit=15.0 2023-11-29 01:43:54,696 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 563750 2023-11-29 01:43:58,045 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 10650, loss[loss=0.05936, simple_loss=0.08361, pruned_loss=0.008445, audio_tagging_loss=0.009113, over 15065.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08956, pruned_loss=0.01219, audio_tagging_loss=0.008555, over 3036066.36 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:44:16,151 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3758406.6666666665, ans=0.125 2023-11-29 01:44:31,290 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3758473.3333333335, ans=0.125 2023-11-29 01:44:40,170 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.23 vs. limit=10.0 2023-11-29 01:44:40,719 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3758540.0, ans=0.125 2023-11-29 01:44:48,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3758606.6666666665, ans=0.125 2023-11-29 01:44:56,333 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 563800 2023-11-29 01:45:00,052 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 10700, loss[loss=0.07774, simple_loss=0.1095, pruned_loss=0.01577, audio_tagging_loss=0.007239, over 14390.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08932, pruned_loss=0.01208, audio_tagging_loss=0.008499, over 3025349.04 frames. ], batch size: 54, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:45:07,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3758673.3333333335, ans=0.0 2023-11-29 01:45:07,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3758673.3333333335, ans=0.1 2023-11-29 01:45:13,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3758740.0, ans=0.125 2023-11-29 01:45:14,192 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 01:45:15,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3758740.0, ans=0.07 2023-11-29 01:45:23,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3758806.6666666665, ans=0.2 2023-11-29 01:45:23,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3758806.6666666665, ans=0.125 2023-11-29 01:45:28,750 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.31 vs. limit=12.0 2023-11-29 01:45:29,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3758806.6666666665, ans=0.09899494936611666 2023-11-29 01:45:37,224 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.299e+01 8.610e+01 9.369e+01 1.025e+02 1.277e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-29 01:45:41,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3758873.3333333335, ans=0.1 2023-11-29 01:45:47,456 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.63 vs. limit=22.5 2023-11-29 01:45:58,445 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 563850 2023-11-29 01:46:01,885 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 10750, loss[loss=0.06494, simple_loss=0.09206, pruned_loss=0.01052, audio_tagging_loss=0.008399, over 14883.00 frames. ], tot_loss[loss=0.0652, simple_loss=0.08944, pruned_loss=0.01194, audio_tagging_loss=0.008543, over 3028857.50 frames. ], batch size: 54, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:46:10,193 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3759006.6666666665, ans=0.0 2023-11-29 01:46:18,633 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.75 vs. limit=22.5 2023-11-29 01:46:32,717 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.41 vs. limit=12.0 2023-11-29 01:46:39,383 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3759206.6666666665, ans=0.125 2023-11-29 01:46:45,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3759206.6666666665, ans=0.125 2023-11-29 01:46:58,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3759273.3333333335, ans=0.2 2023-11-29 01:46:59,045 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 563900 2023-11-29 01:47:02,474 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 10800, loss[loss=0.05342, simple_loss=0.06973, pruned_loss=0.01056, audio_tagging_loss=0.007995, over 15161.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08957, pruned_loss=0.01198, audio_tagging_loss=0.008383, over 3032893.31 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:47:28,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3759473.3333333335, ans=0.0 2023-11-29 01:47:33,751 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.07 vs. limit=22.5 2023-11-29 01:47:41,321 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.657e+01 9.091e+01 9.540e+01 1.017e+02 1.841e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-29 01:47:43,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3759540.0, ans=0.0 2023-11-29 01:47:59,057 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 01:48:00,881 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 563950 2023-11-29 01:48:02,189 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3759606.6666666665, ans=0.0 2023-11-29 01:48:04,367 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 10850, loss[loss=0.06191, simple_loss=0.08948, pruned_loss=0.009042, audio_tagging_loss=0.008131, over 15413.00 frames. ], tot_loss[loss=0.06494, simple_loss=0.08912, pruned_loss=0.01193, audio_tagging_loss=0.008451, over 3030132.88 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:48:04,983 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.40 vs. limit=22.5 2023-11-29 01:48:10,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3759673.3333333335, ans=0.125 2023-11-29 01:48:12,251 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.40 vs. limit=12.0 2023-11-29 01:48:23,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3759740.0, ans=0.07 2023-11-29 01:48:32,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3759806.6666666665, ans=0.0 2023-11-29 01:48:42,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3759873.3333333335, ans=0.1 2023-11-29 01:48:49,113 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.41 vs. limit=22.5 2023-11-29 01:48:49,922 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3759873.3333333335, ans=0.125 2023-11-29 01:48:49,924 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3759873.3333333335, ans=0.2 2023-11-29 01:48:51,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3759873.3333333335, ans=0.125 2023-11-29 01:48:59,084 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.85 vs. limit=10.0 2023-11-29 01:49:01,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3759940.0, ans=0.0 2023-11-29 01:49:03,696 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 564000 2023-11-29 01:49:09,346 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 01:49:10,537 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 10900, loss[loss=0.06353, simple_loss=0.0834, pruned_loss=0.01356, audio_tagging_loss=0.008263, over 15747.00 frames. ], tot_loss[loss=0.06515, simple_loss=0.08924, pruned_loss=0.01202, audio_tagging_loss=0.008505, over 3028274.76 frames. ], batch size: 61, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:49:19,983 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 01:49:33,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3760140.0, ans=0.0 2023-11-29 01:49:41,883 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.62 vs. limit=15.0 2023-11-29 01:49:42,787 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3760140.0, ans=0.125 2023-11-29 01:49:44,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3760140.0, ans=0.125 2023-11-29 01:49:48,913 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.916e+01 9.098e+01 9.720e+01 1.048e+02 1.470e+02, threshold=1.944e+02, percent-clipped=0.0 2023-11-29 01:49:55,662 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.41 vs. limit=15.0 2023-11-29 01:50:07,935 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 564050 2023-11-29 01:50:08,213 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3760273.3333333335, ans=0.2 2023-11-29 01:50:11,366 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 10950, loss[loss=0.06102, simple_loss=0.08299, pruned_loss=0.0102, audio_tagging_loss=0.009328, over 15786.00 frames. ], tot_loss[loss=0.06469, simple_loss=0.08839, pruned_loss=0.01186, audio_tagging_loss=0.008637, over 3027959.60 frames. ], batch size: 60, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:50:15,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3760340.0, ans=0.07 2023-11-29 01:50:20,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3760340.0, ans=0.125 2023-11-29 01:51:08,314 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 564100 2023-11-29 01:51:12,310 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 11000, loss[loss=0.06148, simple_loss=0.08762, pruned_loss=0.009902, audio_tagging_loss=0.007764, over 15107.00 frames. ], tot_loss[loss=0.0645, simple_loss=0.08804, pruned_loss=0.0118, audio_tagging_loss=0.008678, over 3033307.91 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:51:15,462 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=14.13 vs. limit=15.0 2023-11-29 01:51:23,486 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 01:51:23,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3760740.0, ans=0.125 2023-11-29 01:51:32,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3760740.0, ans=0.0 2023-11-29 01:51:38,320 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.22 vs. limit=15.0 2023-11-29 01:51:41,122 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3760806.6666666665, ans=0.125 2023-11-29 01:51:48,066 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3760873.3333333335, ans=0.125 2023-11-29 01:51:48,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3760873.3333333335, ans=0.0 2023-11-29 01:51:51,422 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.918e+01 9.080e+01 9.821e+01 1.045e+02 1.365e+02, threshold=1.964e+02, percent-clipped=0.0 2023-11-29 01:51:51,727 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3760873.3333333335, ans=0.0 2023-11-29 01:51:57,356 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.82 vs. limit=15.0 2023-11-29 01:52:08,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3760940.0, ans=0.125 2023-11-29 01:52:09,345 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 564150 2023-11-29 01:52:14,035 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 11050, loss[loss=0.06068, simple_loss=0.08009, pruned_loss=0.009924, audio_tagging_loss=0.01071, over 15210.00 frames. ], tot_loss[loss=0.06448, simple_loss=0.08781, pruned_loss=0.01176, audio_tagging_loss=0.008815, over 3032608.29 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:52:14,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3761006.6666666665, ans=0.2 2023-11-29 01:52:15,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3761006.6666666665, ans=0.1 2023-11-29 01:52:37,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3761140.0, ans=0.125 2023-11-29 01:52:45,431 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.19 vs. limit=10.0 2023-11-29 01:52:54,028 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3761206.6666666665, ans=0.125 2023-11-29 01:53:06,585 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.66 vs. limit=15.0 2023-11-29 01:53:12,888 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 564200 2023-11-29 01:53:16,676 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 11100, loss[loss=0.06577, simple_loss=0.08582, pruned_loss=0.01487, audio_tagging_loss=0.007987, over 14235.00 frames. ], tot_loss[loss=0.06498, simple_loss=0.08821, pruned_loss=0.01195, audio_tagging_loss=0.008927, over 3040361.16 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:53:56,287 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.539e+01 9.017e+01 9.701e+01 1.046e+02 1.396e+02, threshold=1.940e+02, percent-clipped=0.0 2023-11-29 01:54:08,263 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 01:54:13,914 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 564250 2023-11-29 01:54:15,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3761606.6666666665, ans=0.125 2023-11-29 01:54:17,353 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 11150, loss[loss=0.04795, simple_loss=0.05977, pruned_loss=0.006654, audio_tagging_loss=0.01141, over 15361.00 frames. ], tot_loss[loss=0.06523, simple_loss=0.08854, pruned_loss=0.01193, audio_tagging_loss=0.009026, over 3044188.93 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:54:57,838 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3761873.3333333335, ans=0.125 2023-11-29 01:55:01,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3761873.3333333335, ans=0.07 2023-11-29 01:55:11,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3761940.0, ans=0.0 2023-11-29 01:55:15,836 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 564300 2023-11-29 01:55:16,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3761940.0, ans=0.125 2023-11-29 01:55:19,843 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 11200, loss[loss=0.07094, simple_loss=0.1003, pruned_loss=0.01319, audio_tagging_loss=0.00761, over 16010.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.08887, pruned_loss=0.01201, audio_tagging_loss=0.009003, over 3048971.15 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:55:43,744 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.58 vs. limit=15.0 2023-11-29 01:55:48,436 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.07 vs. limit=22.5 2023-11-29 01:55:58,731 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.132e+01 8.938e+01 9.585e+01 1.042e+02 1.236e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-29 01:56:12,981 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.40 vs. limit=22.5 2023-11-29 01:56:14,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3762273.3333333335, ans=0.125 2023-11-29 01:56:18,000 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 564350 2023-11-29 01:56:18,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3762273.3333333335, ans=0.1 2023-11-29 01:56:19,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3762273.3333333335, ans=0.0 2023-11-29 01:56:20,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3762340.0, ans=0.125 2023-11-29 01:56:21,347 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 11250, loss[loss=0.07638, simple_loss=0.1004, pruned_loss=0.0176, audio_tagging_loss=0.008591, over 15689.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08969, pruned_loss=0.01207, audio_tagging_loss=0.008853, over 3052156.18 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:56:21,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3762340.0, ans=0.2 2023-11-29 01:56:31,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3762406.6666666665, ans=0.015 2023-11-29 01:56:50,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3762473.3333333335, ans=0.0 2023-11-29 01:56:51,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3762473.3333333335, ans=0.1 2023-11-29 01:56:54,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3762473.3333333335, ans=0.1 2023-11-29 01:56:58,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3762540.0, ans=0.125 2023-11-29 01:56:59,581 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3762540.0, ans=0.09899494936611666 2023-11-29 01:57:19,368 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 564400 2023-11-29 01:57:23,269 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 11300, loss[loss=0.06333, simple_loss=0.0912, pruned_loss=0.01191, audio_tagging_loss=0.005823, over 14609.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.08987, pruned_loss=0.01207, audio_tagging_loss=0.008674, over 3048039.22 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:57:31,204 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3762673.3333333335, ans=0.0 2023-11-29 01:57:33,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3762673.3333333335, ans=0.125 2023-11-29 01:57:33,933 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.58 vs. limit=10.0 2023-11-29 01:58:00,523 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.36 vs. limit=6.0 2023-11-29 01:58:04,520 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.418e+01 9.082e+01 9.505e+01 1.023e+02 1.248e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-29 01:58:08,260 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3762873.3333333335, ans=0.125 2023-11-29 01:58:21,589 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 564450 2023-11-29 01:58:24,976 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 11350, loss[loss=0.08102, simple_loss=0.1163, pruned_loss=0.01553, audio_tagging_loss=0.00734, over 16816.00 frames. ], tot_loss[loss=0.06494, simple_loss=0.08904, pruned_loss=0.01182, audio_tagging_loss=0.008598, over 3054685.22 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:58:35,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3763006.6666666665, ans=0.95 2023-11-29 01:58:48,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3763140.0, ans=0.1 2023-11-29 01:59:14,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3763273.3333333335, ans=0.125 2023-11-29 01:59:22,756 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 564500 2023-11-29 01:59:26,225 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 11400, loss[loss=0.05569, simple_loss=0.07522, pruned_loss=0.009854, audio_tagging_loss=0.008219, over 14890.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08957, pruned_loss=0.01194, audio_tagging_loss=0.008513, over 3048830.46 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:59:30,694 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3763340.0, ans=0.0 2023-11-29 01:59:38,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3763406.6666666665, ans=0.0 2023-11-29 01:59:41,041 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3763406.6666666665, ans=0.1 2023-11-29 01:59:43,377 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3763406.6666666665, ans=0.0 2023-11-29 01:59:50,196 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.04 vs. limit=15.0 2023-11-29 01:59:50,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3763473.3333333335, ans=0.1 2023-11-29 02:00:06,801 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.723e+01 8.845e+01 9.645e+01 1.031e+02 1.502e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-29 02:00:12,148 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.14 vs. limit=15.0 2023-11-29 02:00:20,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3763606.6666666665, ans=0.2 2023-11-29 02:00:23,952 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 564550 2023-11-29 02:00:27,294 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 11450, loss[loss=0.0729, simple_loss=0.1027, pruned_loss=0.01348, audio_tagging_loss=0.008079, over 14535.00 frames. ], tot_loss[loss=0.06528, simple_loss=0.08974, pruned_loss=0.01194, audio_tagging_loss=0.008469, over 3045431.83 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 02:00:28,901 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3763673.3333333335, ans=0.0 2023-11-29 02:00:39,629 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3763740.0, ans=0.125 2023-11-29 02:01:02,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3763873.3333333335, ans=0.1 2023-11-29 02:01:15,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3763940.0, ans=0.125 2023-11-29 02:01:24,794 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 564600 2023-11-29 02:01:28,665 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 11500, loss[loss=0.07232, simple_loss=0.09912, pruned_loss=0.01432, audio_tagging_loss=0.008437, over 15190.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08984, pruned_loss=0.01207, audio_tagging_loss=0.008444, over 3053452.97 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 02:02:09,661 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.803e+01 8.942e+01 9.581e+01 1.022e+02 1.259e+02, threshold=1.916e+02, percent-clipped=0.0 2023-11-29 02:02:17,052 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3764273.3333333335, ans=0.125 2023-11-29 02:02:17,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3764273.3333333335, ans=0.1 2023-11-29 02:02:24,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3764273.3333333335, ans=0.1 2023-11-29 02:02:27,439 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 564650 2023-11-29 02:02:30,876 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 11550, loss[loss=0.08198, simple_loss=0.104, pruned_loss=0.01875, audio_tagging_loss=0.01122, over 14735.00 frames. ], tot_loss[loss=0.0651, simple_loss=0.08931, pruned_loss=0.01198, audio_tagging_loss=0.008461, over 3053013.95 frames. ], batch size: 53, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 02:02:39,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3764340.0, ans=0.125 2023-11-29 02:02:45,731 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3764406.6666666665, ans=0.125 2023-11-29 02:02:47,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3764406.6666666665, ans=0.125 2023-11-29 02:03:00,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3764473.3333333335, ans=0.125 2023-11-29 02:03:08,209 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.35 vs. limit=6.0 2023-11-29 02:03:10,520 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 02:03:15,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3764540.0, ans=0.0 2023-11-29 02:03:21,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3764606.6666666665, ans=0.125 2023-11-29 02:03:28,027 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 564700 2023-11-29 02:03:32,120 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 11600, loss[loss=0.06201, simple_loss=0.08262, pruned_loss=0.008958, audio_tagging_loss=0.01175, over 15809.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08984, pruned_loss=0.01207, audio_tagging_loss=0.008505, over 3055250.48 frames. ], batch size: 60, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 02:03:33,944 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.24 vs. limit=12.0 2023-11-29 02:03:37,101 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3764673.3333333335, ans=0.09899494936611666 2023-11-29 02:03:38,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3764673.3333333335, ans=0.2 2023-11-29 02:03:56,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3764806.6666666665, ans=0.2 2023-11-29 02:04:13,362 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.962e+01 8.912e+01 9.637e+01 1.036e+02 1.418e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-29 02:04:17,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3764873.3333333335, ans=0.0 2023-11-29 02:04:20,606 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.82 vs. limit=6.0 2023-11-29 02:04:29,754 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 564750 2023-11-29 02:04:33,237 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 11650, loss[loss=0.09936, simple_loss=0.1385, pruned_loss=0.02224, audio_tagging_loss=0.007882, over 15987.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08996, pruned_loss=0.01198, audio_tagging_loss=0.008581, over 3048888.58 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 02:04:43,952 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 02:04:52,667 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.51 vs. limit=15.0 2023-11-29 02:05:11,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3765206.6666666665, ans=0.2 2023-11-29 02:05:13,151 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.84 vs. limit=15.0 2023-11-29 02:05:20,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3765206.6666666665, ans=0.0 2023-11-29 02:05:27,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3765273.3333333335, ans=0.125 2023-11-29 02:05:29,901 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3765273.3333333335, ans=0.1 2023-11-29 02:05:31,090 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 564800 2023-11-29 02:05:34,765 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 11700, loss[loss=0.07076, simple_loss=0.0993, pruned_loss=0.01493, audio_tagging_loss=0.006184, over 13916.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.0898, pruned_loss=0.01206, audio_tagging_loss=0.008635, over 3037760.35 frames. ], batch size: 53, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 02:05:38,536 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3765340.0, ans=0.0 2023-11-29 02:05:42,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3765340.0, ans=0.1 2023-11-29 02:05:45,291 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3765406.6666666665, ans=0.125 2023-11-29 02:05:46,636 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3765406.6666666665, ans=0.1 2023-11-29 02:05:47,716 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 02:05:55,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3765406.6666666665, ans=0.1 2023-11-29 02:06:16,822 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.400e+01 8.764e+01 9.502e+01 1.022e+02 1.260e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-29 02:06:29,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3765606.6666666665, ans=0.1 2023-11-29 02:06:32,076 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 564850 2023-11-29 02:06:35,564 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 11750, loss[loss=0.07361, simple_loss=0.1012, pruned_loss=0.01502, audio_tagging_loss=0.008001, over 15977.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.08989, pruned_loss=0.01203, audio_tagging_loss=0.008594, over 3038603.60 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 02:06:39,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3765673.3333333335, ans=0.0 2023-11-29 02:06:47,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3765740.0, ans=0.125 2023-11-29 02:06:52,686 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.05 vs. limit=6.0 2023-11-29 02:07:06,388 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.76 vs. limit=22.5 2023-11-29 02:07:13,260 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.37 vs. limit=22.5 2023-11-29 02:07:14,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3765873.3333333335, ans=0.0 2023-11-29 02:07:27,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3765940.0, ans=0.2 2023-11-29 02:07:27,364 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.38 vs. limit=12.0 2023-11-29 02:07:28,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3765940.0, ans=0.0 2023-11-29 02:07:34,185 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 564900 2023-11-29 02:07:37,667 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 11800, loss[loss=0.05817, simple_loss=0.07382, pruned_loss=0.01281, audio_tagging_loss=0.008456, over 15258.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08951, pruned_loss=0.01184, audio_tagging_loss=0.008582, over 3043752.20 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 02:07:57,626 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.53 vs. limit=22.5 2023-11-29 02:08:06,543 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3766140.0, ans=0.125 2023-11-29 02:08:06,806 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.46 vs. limit=6.0 2023-11-29 02:08:17,931 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.655e+01 9.243e+01 9.839e+01 1.045e+02 1.336e+02, threshold=1.968e+02, percent-clipped=0.0 2023-11-29 02:08:27,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=3766273.3333333335, ans=15.0 2023-11-29 02:08:27,860 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3766273.3333333335, ans=0.0 2023-11-29 02:08:35,205 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 564950 2023-11-29 02:08:36,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3766273.3333333335, ans=0.125 2023-11-29 02:08:38,652 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 11850, loss[loss=0.07587, simple_loss=0.1071, pruned_loss=0.01485, audio_tagging_loss=0.007466, over 14244.00 frames. ], tot_loss[loss=0.06528, simple_loss=0.08975, pruned_loss=0.01174, audio_tagging_loss=0.008673, over 3047920.55 frames. ], batch size: 54, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 02:08:41,499 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3766340.0, ans=0.125 2023-11-29 02:08:45,300 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.12 vs. limit=15.0 2023-11-29 02:08:49,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3766406.6666666665, ans=0.125 2023-11-29 02:08:53,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3766406.6666666665, ans=0.0 2023-11-29 02:09:28,420 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.86 vs. limit=22.5 2023-11-29 02:09:31,828 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.49 vs. limit=15.0 2023-11-29 02:09:34,962 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 565000 2023-11-29 02:09:38,742 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 11900, loss[loss=0.07462, simple_loss=0.09978, pruned_loss=0.01564, audio_tagging_loss=0.009089, over 15843.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08971, pruned_loss=0.01182, audio_tagging_loss=0.008756, over 3043508.13 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 02:09:51,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3766740.0, ans=0.2 2023-11-29 02:09:55,930 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3766740.0, ans=0.1 2023-11-29 02:10:17,635 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.49 vs. limit=15.0 2023-11-29 02:10:19,336 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.802e+01 8.853e+01 9.543e+01 1.009e+02 1.340e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-29 02:10:31,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3766940.0, ans=0.125 2023-11-29 02:10:34,529 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 565050 2023-11-29 02:10:35,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3766940.0, ans=0.125 2023-11-29 02:10:37,879 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 11950, loss[loss=0.06982, simple_loss=0.09068, pruned_loss=0.01613, audio_tagging_loss=0.008343, over 14550.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.08931, pruned_loss=0.01191, audio_tagging_loss=0.008844, over 3046697.01 frames. ], batch size: 54, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 02:10:47,688 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3767006.6666666665, ans=0.1 2023-11-29 02:11:12,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3767140.0, ans=0.125 2023-11-29 02:11:12,602 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3767140.0, ans=0.125 2023-11-29 02:11:20,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3767206.6666666665, ans=0.0 2023-11-29 02:11:31,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3767273.3333333335, ans=0.0 2023-11-29 02:11:32,387 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3767273.3333333335, ans=0.5 2023-11-29 02:11:33,296 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 565100 2023-11-29 02:11:34,710 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3767273.3333333335, ans=0.125 2023-11-29 02:11:36,592 INFO [train_asr.py:1235] (1/4) Epoch 47, batch 12000, loss[loss=0.07278, simple_loss=0.1054, pruned_loss=0.01268, audio_tagging_loss=0.0074, over 14617.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08894, pruned_loss=0.01187, audio_tagging_loss=0.008948, over 3054846.85 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 02:11:36,593 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-29 02:11:58,859 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([1.9932, 2.9142, 2.6774, 2.7349, 3.2341, 3.3868, 3.1343, 3.6053], device='cuda:1') 2023-11-29 02:12:16,760 INFO [train_asr.py:1267] (1/4) Epoch 47, validation: loss=0.05799, simple_loss=0.0505, pruned_loss=0.005391, audio_tagging_loss=0.02735, over 4681554.00 frames. 2023-11-29 02:12:16,765 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-29 02:12:18,161 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3767340.0, ans=0.1 2023-11-29 02:12:19,737 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.66 vs. limit=15.0 2023-11-29 02:13:00,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3767493.3333333335, ans=0.125 2023-11-29 02:13:01,521 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 0, loss[loss=0.05902, simple_loss=0.05929, pruned_loss=0.006695, audio_tagging_loss=0.02269, over 15568.00 frames. ], tot_loss[loss=0.05902, simple_loss=0.05929, pruned_loss=0.006695, audio_tagging_loss=0.02269, over 15568.00 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:13:01,522 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-29 02:13:36,860 INFO [train_asr.py:1267] (1/4) Epoch 48, validation: loss=0.05814, simple_loss=0.05045, pruned_loss=0.005317, audio_tagging_loss=0.02759, over 4681554.00 frames. 2023-11-29 02:13:36,861 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-29 02:13:50,162 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.52 vs. limit=6.0 2023-11-29 02:13:50,687 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.137e+01 9.361e+01 1.012e+02 1.115e+02 1.422e+02, threshold=2.023e+02, percent-clipped=0.0 2023-11-29 02:13:58,047 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3767560.0, ans=0.0 2023-11-29 02:14:08,496 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 565150 2023-11-29 02:14:23,084 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=3767693.3333333335, ans=0.5 2023-11-29 02:14:40,346 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 50, loss[loss=0.07384, simple_loss=0.08092, pruned_loss=0.0121, audio_tagging_loss=0.02128, over 14082.00 frames. ], tot_loss[loss=0.07046, simple_loss=0.08525, pruned_loss=0.01106, audio_tagging_loss=0.01677, over 689695.91 frames. ], batch size: 53, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:15:04,246 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3767960.0, ans=0.0 2023-11-29 02:15:06,049 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.32 vs. limit=15.0 2023-11-29 02:15:09,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3767960.0, ans=0.0 2023-11-29 02:15:09,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3767960.0, ans=0.125 2023-11-29 02:15:10,170 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 565200 2023-11-29 02:15:36,505 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3768093.3333333335, ans=0.125 2023-11-29 02:15:40,727 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3768093.3333333335, ans=0.125 2023-11-29 02:15:42,941 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.11 vs. limit=15.0 2023-11-29 02:15:43,432 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 100, loss[loss=0.05888, simple_loss=0.07164, pruned_loss=0.008415, audio_tagging_loss=0.01464, over 14739.00 frames. ], tot_loss[loss=0.07274, simple_loss=0.08982, pruned_loss=0.01189, audio_tagging_loss=0.01594, over 1217108.66 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:15:44,184 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.64 vs. limit=12.0 2023-11-29 02:15:44,958 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3768160.0, ans=0.125 2023-11-29 02:15:56,407 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.236e+01 9.896e+01 1.062e+02 1.155e+02 1.316e+02, threshold=2.123e+02, percent-clipped=0.0 2023-11-29 02:16:02,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3768226.6666666665, ans=0.2 2023-11-29 02:16:10,172 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3768293.3333333335, ans=0.125 2023-11-29 02:16:12,236 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 565250 2023-11-29 02:16:21,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3768360.0, ans=0.0 2023-11-29 02:16:32,866 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3768426.6666666665, ans=0.2 2023-11-29 02:16:35,336 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.66 vs. limit=15.0 2023-11-29 02:16:36,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3768426.6666666665, ans=0.1 2023-11-29 02:16:43,704 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 150, loss[loss=0.05332, simple_loss=0.05741, pruned_loss=0.007844, audio_tagging_loss=0.01677, over 15195.00 frames. ], tot_loss[loss=0.07197, simple_loss=0.0912, pruned_loss=0.01226, audio_tagging_loss=0.01411, over 1620067.37 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:16:46,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3768493.3333333335, ans=0.125 2023-11-29 02:16:53,937 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.80 vs. limit=15.0 2023-11-29 02:17:07,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3768626.6666666665, ans=0.0 2023-11-29 02:17:14,167 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 565300 2023-11-29 02:17:46,597 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 200, loss[loss=0.05985, simple_loss=0.0752, pruned_loss=0.01017, audio_tagging_loss=0.01207, over 14737.00 frames. ], tot_loss[loss=0.07047, simple_loss=0.09139, pruned_loss=0.0123, audio_tagging_loss=0.01247, over 1937988.31 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:18:00,600 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3768893.3333333335, ans=0.0 2023-11-29 02:18:02,094 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.631e+01 9.150e+01 9.879e+01 1.074e+02 1.273e+02, threshold=1.976e+02, percent-clipped=0.0 2023-11-29 02:18:04,161 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3768893.3333333335, ans=0.0 2023-11-29 02:18:11,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3768960.0, ans=0.95 2023-11-29 02:18:16,735 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 565350 2023-11-29 02:18:38,831 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3769093.3333333335, ans=0.0 2023-11-29 02:18:49,032 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 250, loss[loss=0.05833, simple_loss=0.07764, pruned_loss=0.009853, audio_tagging_loss=0.009655, over 14527.00 frames. ], tot_loss[loss=0.06947, simple_loss=0.09132, pruned_loss=0.01251, audio_tagging_loss=0.0113, over 2182236.49 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:18:51,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3769160.0, ans=0.125 2023-11-29 02:18:52,172 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3769160.0, ans=0.125 2023-11-29 02:19:12,104 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3769293.3333333335, ans=0.125 2023-11-29 02:19:18,413 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 565400 2023-11-29 02:19:23,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3769293.3333333335, ans=0.1 2023-11-29 02:19:24,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3769360.0, ans=0.04949747468305833 2023-11-29 02:19:34,727 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3769360.0, ans=0.0 2023-11-29 02:19:34,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3769360.0, ans=0.0 2023-11-29 02:19:45,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3769426.6666666665, ans=0.0 2023-11-29 02:19:51,090 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 300, loss[loss=0.0477, simple_loss=0.05393, pruned_loss=0.01071, audio_tagging_loss=0.01003, over 14839.00 frames. ], tot_loss[loss=0.06828, simple_loss=0.0907, pruned_loss=0.01243, audio_tagging_loss=0.0105, over 2368946.84 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:20:05,682 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.057e+01 9.202e+01 9.824e+01 1.066e+02 1.297e+02, threshold=1.965e+02, percent-clipped=0.0 2023-11-29 02:20:11,014 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.15 vs. limit=15.0 2023-11-29 02:20:19,704 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 565450 2023-11-29 02:20:25,104 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.78 vs. limit=15.0 2023-11-29 02:20:51,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3769826.6666666665, ans=0.1 2023-11-29 02:20:52,717 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 350, loss[loss=0.07852, simple_loss=0.1126, pruned_loss=0.01503, audio_tagging_loss=0.0072, over 15747.00 frames. ], tot_loss[loss=0.0674, simple_loss=0.09048, pruned_loss=0.01221, audio_tagging_loss=0.009951, over 2517152.09 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:21:02,345 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.11 vs. limit=22.5 2023-11-29 02:21:22,216 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 565500 2023-11-29 02:21:44,263 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3770093.3333333335, ans=0.125 2023-11-29 02:21:53,379 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 400, loss[loss=0.08151, simple_loss=0.1153, pruned_loss=0.01694, audio_tagging_loss=0.00693, over 15597.00 frames. ], tot_loss[loss=0.06672, simple_loss=0.0902, pruned_loss=0.012, audio_tagging_loss=0.009612, over 2636437.52 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:22:09,115 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.365e+01 8.956e+01 9.429e+01 1.009e+02 1.369e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-29 02:22:09,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3770226.6666666665, ans=0.2 2023-11-29 02:22:09,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3770226.6666666665, ans=0.125 2023-11-29 02:22:10,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3770226.6666666665, ans=0.125 2023-11-29 02:22:23,957 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 565550 2023-11-29 02:22:39,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3770360.0, ans=0.125 2023-11-29 02:22:40,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3770360.0, ans=0.2 2023-11-29 02:22:51,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3770426.6666666665, ans=0.0 2023-11-29 02:22:55,618 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.76 vs. limit=15.0 2023-11-29 02:22:56,257 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 450, loss[loss=0.06073, simple_loss=0.08088, pruned_loss=0.01312, audio_tagging_loss=0.00717, over 14866.00 frames. ], tot_loss[loss=0.06644, simple_loss=0.09024, pruned_loss=0.01199, audio_tagging_loss=0.009334, over 2726356.65 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:23:10,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3770560.0, ans=0.125 2023-11-29 02:23:22,015 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.30 vs. limit=15.0 2023-11-29 02:23:23,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3770626.6666666665, ans=0.09899494936611666 2023-11-29 02:23:24,951 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 565600 2023-11-29 02:23:40,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3770693.3333333335, ans=0.1 2023-11-29 02:23:50,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3770760.0, ans=0.0 2023-11-29 02:23:57,794 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 500, loss[loss=0.06018, simple_loss=0.07931, pruned_loss=0.01098, audio_tagging_loss=0.009547, over 14678.00 frames. ], tot_loss[loss=0.0658, simple_loss=0.08956, pruned_loss=0.01183, audio_tagging_loss=0.009187, over 2795530.54 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:23:58,172 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3770826.6666666665, ans=0.125 2023-11-29 02:24:03,906 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 02:24:12,854 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.822e+01 8.944e+01 9.480e+01 1.026e+02 1.531e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-29 02:24:22,527 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-29 02:24:26,929 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 565650 2023-11-29 02:24:28,237 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3770960.0, ans=0.125 2023-11-29 02:24:29,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3770960.0, ans=0.05 2023-11-29 02:24:39,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3771026.6666666665, ans=0.0 2023-11-29 02:24:42,556 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3771026.6666666665, ans=0.125 2023-11-29 02:24:58,605 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 550, loss[loss=0.07214, simple_loss=0.09772, pruned_loss=0.01401, audio_tagging_loss=0.009272, over 15020.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.08922, pruned_loss=0.01176, audio_tagging_loss=0.009068, over 2856248.77 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:25:02,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3771160.0, ans=0.015 2023-11-29 02:25:28,958 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 565700 2023-11-29 02:25:41,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3771360.0, ans=0.125 2023-11-29 02:25:46,719 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3771426.6666666665, ans=0.07 2023-11-29 02:26:00,433 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 600, loss[loss=0.05044, simple_loss=0.06251, pruned_loss=0.01051, audio_tagging_loss=0.00867, over 13969.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08907, pruned_loss=0.01172, audio_tagging_loss=0.009033, over 2897055.66 frames. ], batch size: 53, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:26:08,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3771493.3333333335, ans=0.125 2023-11-29 02:26:16,979 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.722e+01 8.960e+01 9.657e+01 1.065e+02 1.783e+02, threshold=1.931e+02, percent-clipped=0.0 2023-11-29 02:26:27,828 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3771626.6666666665, ans=0.1 2023-11-29 02:26:30,102 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 565750 2023-11-29 02:26:32,513 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3771626.6666666665, ans=0.1 2023-11-29 02:26:33,670 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 02:27:02,382 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 650, loss[loss=0.06846, simple_loss=0.09967, pruned_loss=0.01261, audio_tagging_loss=0.006008, over 15197.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.0896, pruned_loss=0.01196, audio_tagging_loss=0.008998, over 2930020.76 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:27:31,200 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 565800 2023-11-29 02:27:38,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3772026.6666666665, ans=0.1 2023-11-29 02:27:50,161 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.47 vs. limit=15.0 2023-11-29 02:28:03,586 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 700, loss[loss=0.05311, simple_loss=0.06644, pruned_loss=0.01027, audio_tagging_loss=0.009622, over 15414.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08942, pruned_loss=0.01191, audio_tagging_loss=0.008824, over 2957275.68 frames. ], batch size: 61, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:28:05,349 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.85 vs. limit=15.0 2023-11-29 02:28:16,310 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.69 vs. limit=15.0 2023-11-29 02:28:17,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3772226.6666666665, ans=0.025 2023-11-29 02:28:19,246 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.220e+01 8.910e+01 9.498e+01 1.006e+02 1.347e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-29 02:28:20,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3772226.6666666665, ans=0.125 2023-11-29 02:28:32,965 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 565850 2023-11-29 02:28:39,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3772360.0, ans=0.125 2023-11-29 02:28:58,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3772426.6666666665, ans=0.0 2023-11-29 02:29:04,558 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 750, loss[loss=0.09191, simple_loss=0.1129, pruned_loss=0.02318, audio_tagging_loss=0.01227, over 15996.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.08991, pruned_loss=0.01206, audio_tagging_loss=0.008802, over 2981551.15 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:29:04,745 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3772493.3333333335, ans=0.0 2023-11-29 02:29:19,324 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3772560.0, ans=0.125 2023-11-29 02:29:19,974 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.41 vs. limit=15.0 2023-11-29 02:29:32,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3772626.6666666665, ans=0.0 2023-11-29 02:29:33,927 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 565900 2023-11-29 02:29:43,979 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.35 vs. limit=15.0 2023-11-29 02:29:52,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3772760.0, ans=0.0 2023-11-29 02:29:53,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3772760.0, ans=0.2 2023-11-29 02:29:58,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3772760.0, ans=0.2 2023-11-29 02:29:58,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3772760.0, ans=0.125 2023-11-29 02:30:05,608 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 800, loss[loss=0.09029, simple_loss=0.1398, pruned_loss=0.01426, audio_tagging_loss=0.006109, over 15827.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.08984, pruned_loss=0.01197, audio_tagging_loss=0.008831, over 2999104.00 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:30:12,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3772826.6666666665, ans=0.1 2023-11-29 02:30:21,573 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.416e+01 9.034e+01 9.772e+01 1.029e+02 1.331e+02, threshold=1.954e+02, percent-clipped=0.0 2023-11-29 02:30:35,172 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 565950 2023-11-29 02:30:52,121 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3773026.6666666665, ans=0.0 2023-11-29 02:31:06,987 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 850, loss[loss=0.09155, simple_loss=0.1255, pruned_loss=0.02211, audio_tagging_loss=0.00672, over 15177.00 frames. ], tot_loss[loss=0.06499, simple_loss=0.08841, pruned_loss=0.01182, audio_tagging_loss=0.008964, over 3012375.11 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:31:14,368 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3773160.0, ans=0.05 2023-11-29 02:31:35,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3773293.3333333335, ans=0.125 2023-11-29 02:31:36,537 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 566000 2023-11-29 02:31:48,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3773360.0, ans=0.0 2023-11-29 02:31:53,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3773360.0, ans=0.125 2023-11-29 02:31:55,343 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.46 vs. limit=15.0 2023-11-29 02:31:56,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3773426.6666666665, ans=0.125 2023-11-29 02:32:08,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3773493.3333333335, ans=0.125 2023-11-29 02:32:09,894 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 900, loss[loss=0.07709, simple_loss=0.1168, pruned_loss=0.01279, audio_tagging_loss=0.005926, over 15134.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.0893, pruned_loss=0.01196, audio_tagging_loss=0.008986, over 3026024.03 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:32:26,365 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.248e+01 9.124e+01 9.810e+01 1.032e+02 1.259e+02, threshold=1.962e+02, percent-clipped=0.0 2023-11-29 02:32:36,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3773626.6666666665, ans=0.125 2023-11-29 02:32:39,599 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 566050 2023-11-29 02:32:50,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3773693.3333333335, ans=0.1 2023-11-29 02:33:05,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3773760.0, ans=0.125 2023-11-29 02:33:11,555 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 950, loss[loss=0.06697, simple_loss=0.09285, pruned_loss=0.01318, audio_tagging_loss=0.007361, over 16261.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08925, pruned_loss=0.01183, audio_tagging_loss=0.008905, over 3028884.67 frames. ], batch size: 61, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:33:14,849 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.80 vs. limit=15.0 2023-11-29 02:33:28,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3773893.3333333335, ans=0.2 2023-11-29 02:33:32,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3773893.3333333335, ans=0.125 2023-11-29 02:33:42,034 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 566100 2023-11-29 02:33:43,441 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3773960.0, ans=0.0 2023-11-29 02:34:13,551 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 1000, loss[loss=0.07161, simple_loss=0.0952, pruned_loss=0.01725, audio_tagging_loss=0.006765, over 14931.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.08905, pruned_loss=0.01181, audio_tagging_loss=0.008824, over 3028424.32 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:34:14,138 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.23 vs. limit=15.0 2023-11-29 02:34:20,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3774160.0, ans=0.0 2023-11-29 02:34:27,636 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3774226.6666666665, ans=0.0 2023-11-29 02:34:30,803 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.986e+01 9.000e+01 9.678e+01 1.023e+02 1.395e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-29 02:34:32,365 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3774226.6666666665, ans=0.125 2023-11-29 02:34:41,591 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 02:34:42,778 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 566150 2023-11-29 02:35:05,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3774426.6666666665, ans=0.125 2023-11-29 02:35:15,270 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 1050, loss[loss=0.07177, simple_loss=0.1017, pruned_loss=0.01336, audio_tagging_loss=0.007534, over 15856.00 frames. ], tot_loss[loss=0.06497, simple_loss=0.08879, pruned_loss=0.01186, audio_tagging_loss=0.008715, over 3024227.65 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:35:15,887 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.31 vs. limit=15.0 2023-11-29 02:35:27,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3774560.0, ans=0.125 2023-11-29 02:35:28,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3774560.0, ans=0.125 2023-11-29 02:35:32,090 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3774560.0, ans=0.125 2023-11-29 02:35:33,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3774560.0, ans=0.125 2023-11-29 02:35:40,781 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3774626.6666666665, ans=0.0 2023-11-29 02:35:44,129 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 566200 2023-11-29 02:36:16,943 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 1100, loss[loss=0.06542, simple_loss=0.09689, pruned_loss=0.01047, audio_tagging_loss=0.0065, over 15617.00 frames. ], tot_loss[loss=0.06486, simple_loss=0.0886, pruned_loss=0.01191, audio_tagging_loss=0.008653, over 3027065.05 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:36:19,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=3774826.6666666665, ans=0.05 2023-11-29 02:36:21,720 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 02:36:34,690 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.373e+01 8.921e+01 9.429e+01 9.964e+01 1.346e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-29 02:36:40,103 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.93 vs. limit=15.0 2023-11-29 02:36:44,690 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.11 vs. limit=22.5 2023-11-29 02:36:47,053 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 566250 2023-11-29 02:36:51,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3774960.0, ans=0.1 2023-11-29 02:36:55,433 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.73 vs. limit=15.0 2023-11-29 02:37:02,083 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3775026.6666666665, ans=0.1 2023-11-29 02:37:10,605 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.61 vs. limit=6.0 2023-11-29 02:37:19,256 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 1150, loss[loss=0.05762, simple_loss=0.08389, pruned_loss=0.008043, audio_tagging_loss=0.007627, over 14573.00 frames. ], tot_loss[loss=0.06423, simple_loss=0.08773, pruned_loss=0.01172, audio_tagging_loss=0.008653, over 3033645.76 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:37:49,356 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 566300 2023-11-29 02:37:49,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3775293.3333333335, ans=0.125 2023-11-29 02:38:12,749 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.62 vs. limit=15.0 2023-11-29 02:38:21,964 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 1200, loss[loss=0.08135, simple_loss=0.1185, pruned_loss=0.01594, audio_tagging_loss=0.006165, over 15288.00 frames. ], tot_loss[loss=0.06476, simple_loss=0.08855, pruned_loss=0.01194, audio_tagging_loss=0.008538, over 3042042.73 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:38:24,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3775493.3333333335, ans=0.125 2023-11-29 02:38:39,036 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.292e+01 9.106e+01 9.655e+01 1.032e+02 1.347e+02, threshold=1.931e+02, percent-clipped=0.0 2023-11-29 02:38:44,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3775560.0, ans=0.0 2023-11-29 02:38:47,469 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.99 vs. limit=15.0 2023-11-29 02:38:51,472 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 566350 2023-11-29 02:38:52,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3775626.6666666665, ans=0.125 2023-11-29 02:39:01,665 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3775693.3333333335, ans=0.125 2023-11-29 02:39:06,398 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3775693.3333333335, ans=0.0 2023-11-29 02:39:15,406 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3775760.0, ans=0.125 2023-11-29 02:39:18,325 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.27 vs. limit=22.5 2023-11-29 02:39:22,615 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 02:39:23,537 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 1250, loss[loss=0.07554, simple_loss=0.1059, pruned_loss=0.01375, audio_tagging_loss=0.008855, over 14942.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.08924, pruned_loss=0.01198, audio_tagging_loss=0.00853, over 3044914.87 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:39:53,086 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 566400 2023-11-29 02:39:54,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3775960.0, ans=0.025 2023-11-29 02:39:58,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3775960.0, ans=0.2 2023-11-29 02:39:59,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3775960.0, ans=0.125 2023-11-29 02:40:06,688 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=3776026.6666666665, ans=6.0 2023-11-29 02:40:10,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3776026.6666666665, ans=0.0 2023-11-29 02:40:25,334 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 1300, loss[loss=0.05865, simple_loss=0.08393, pruned_loss=0.00864, audio_tagging_loss=0.008044, over 15653.00 frames. ], tot_loss[loss=0.06506, simple_loss=0.08944, pruned_loss=0.01186, audio_tagging_loss=0.008483, over 3042037.33 frames. ], batch size: 61, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:40:43,991 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.682e+01 8.895e+01 9.443e+01 1.023e+02 1.246e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-29 02:40:45,958 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3776226.6666666665, ans=0.2 2023-11-29 02:40:49,413 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3776293.3333333335, ans=0.0 2023-11-29 02:40:55,089 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 566450 2023-11-29 02:41:26,049 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 1350, loss[loss=0.06254, simple_loss=0.08985, pruned_loss=0.01014, audio_tagging_loss=0.007477, over 15008.00 frames. ], tot_loss[loss=0.06416, simple_loss=0.08782, pruned_loss=0.01173, audio_tagging_loss=0.008523, over 3040601.87 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:41:39,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3776560.0, ans=0.125 2023-11-29 02:41:46,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3776560.0, ans=0.0 2023-11-29 02:41:57,040 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 566500 2023-11-29 02:42:13,926 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 02:42:19,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3776760.0, ans=0.125 2023-11-29 02:42:25,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3776760.0, ans=0.0 2023-11-29 02:42:26,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3776760.0, ans=0.0 2023-11-29 02:42:29,818 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 1400, loss[loss=0.07254, simple_loss=0.1041, pruned_loss=0.01424, audio_tagging_loss=0.006245, over 15035.00 frames. ], tot_loss[loss=0.0643, simple_loss=0.08786, pruned_loss=0.01176, audio_tagging_loss=0.008606, over 3039695.54 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:42:47,816 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.652e+01 8.922e+01 9.372e+01 1.016e+02 1.403e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-29 02:42:58,399 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 566550 2023-11-29 02:43:05,092 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3777026.6666666665, ans=0.125 2023-11-29 02:43:05,122 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3777026.6666666665, ans=0.125 2023-11-29 02:43:10,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3777026.6666666665, ans=0.125 2023-11-29 02:43:21,900 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 02:43:29,453 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3777160.0, ans=0.2 2023-11-29 02:43:30,451 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 1450, loss[loss=0.0631, simple_loss=0.08799, pruned_loss=0.009814, audio_tagging_loss=0.009293, over 16354.00 frames. ], tot_loss[loss=0.06467, simple_loss=0.08853, pruned_loss=0.01186, audio_tagging_loss=0.008544, over 3044113.14 frames. ], batch size: 62, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:43:33,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3777160.0, ans=0.125 2023-11-29 02:43:34,630 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.22 vs. limit=15.0 2023-11-29 02:43:45,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3777226.6666666665, ans=0.2 2023-11-29 02:43:54,838 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3777293.3333333335, ans=0.0 2023-11-29 02:44:00,629 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 566600 2023-11-29 02:44:13,593 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3777360.0, ans=0.125 2023-11-29 02:44:32,074 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 1500, loss[loss=0.041, simple_loss=0.05179, pruned_loss=0.005017, audio_tagging_loss=0.01009, over 14408.00 frames. ], tot_loss[loss=0.06505, simple_loss=0.08907, pruned_loss=0.01198, audio_tagging_loss=0.008539, over 3045262.32 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:44:37,863 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3777493.3333333335, ans=0.2 2023-11-29 02:44:51,144 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.796e+01 9.184e+01 9.950e+01 1.078e+02 1.281e+02, threshold=1.990e+02, percent-clipped=0.0 2023-11-29 02:45:02,431 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 566650 2023-11-29 02:45:04,952 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3777626.6666666665, ans=0.2 2023-11-29 02:45:11,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3777693.3333333335, ans=0.1 2023-11-29 02:45:13,292 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.51 vs. limit=15.0 2023-11-29 02:45:15,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3777693.3333333335, ans=0.125 2023-11-29 02:45:22,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3777760.0, ans=0.1 2023-11-29 02:45:24,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3777760.0, ans=0.0 2023-11-29 02:45:33,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3777826.6666666665, ans=0.0 2023-11-29 02:45:34,521 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 1550, loss[loss=0.0436, simple_loss=0.05644, pruned_loss=0.004352, audio_tagging_loss=0.01103, over 14952.00 frames. ], tot_loss[loss=0.06496, simple_loss=0.08879, pruned_loss=0.01192, audio_tagging_loss=0.008643, over 3046491.01 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:45:38,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3777826.6666666665, ans=0.125 2023-11-29 02:45:40,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3777826.6666666665, ans=0.1 2023-11-29 02:45:44,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3777826.6666666665, ans=0.125 2023-11-29 02:45:54,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3777893.3333333335, ans=0.125 2023-11-29 02:45:57,146 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.06 vs. limit=22.5 2023-11-29 02:45:57,342 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.25 vs. limit=15.0 2023-11-29 02:46:03,626 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 566700 2023-11-29 02:46:11,693 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.77 vs. limit=10.0 2023-11-29 02:46:31,520 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=3778093.3333333335, ans=0.5 2023-11-29 02:46:36,487 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 1600, loss[loss=0.06849, simple_loss=0.1031, pruned_loss=0.009981, audio_tagging_loss=0.006959, over 16194.00 frames. ], tot_loss[loss=0.06497, simple_loss=0.08874, pruned_loss=0.01181, audio_tagging_loss=0.008787, over 3049856.18 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:46:36,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3778160.0, ans=0.125 2023-11-29 02:46:45,761 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.01 vs. limit=12.0 2023-11-29 02:46:49,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3778226.6666666665, ans=0.125 2023-11-29 02:46:54,286 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.477e+01 9.129e+01 9.735e+01 1.042e+02 2.046e+02, threshold=1.947e+02, percent-clipped=1.0 2023-11-29 02:46:57,536 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3778226.6666666665, ans=0.125 2023-11-29 02:47:06,793 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 566750 2023-11-29 02:47:11,688 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3778293.3333333335, ans=0.125 2023-11-29 02:47:21,406 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 02:47:22,441 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3778360.0, ans=0.1 2023-11-29 02:47:29,670 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3778426.6666666665, ans=0.0 2023-11-29 02:47:37,592 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 1650, loss[loss=0.06442, simple_loss=0.07963, pruned_loss=0.01393, audio_tagging_loss=0.01068, over 14990.00 frames. ], tot_loss[loss=0.06511, simple_loss=0.08845, pruned_loss=0.01195, audio_tagging_loss=0.008932, over 3048640.05 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:47:53,981 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3778560.0, ans=0.125 2023-11-29 02:48:07,740 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 566800 2023-11-29 02:48:09,574 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 02:48:14,033 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.82 vs. limit=22.5 2023-11-29 02:48:19,896 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.77 vs. limit=15.0 2023-11-29 02:48:23,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3778693.3333333335, ans=0.125 2023-11-29 02:48:40,048 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 1700, loss[loss=0.05678, simple_loss=0.0737, pruned_loss=0.007081, audio_tagging_loss=0.01285, over 15842.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08951, pruned_loss=0.01201, audio_tagging_loss=0.008876, over 3051689.91 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:48:53,036 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.36 vs. limit=12.0 2023-11-29 02:48:58,817 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.604e+01 9.116e+01 9.697e+01 1.043e+02 1.617e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-29 02:49:03,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3778960.0, ans=0.125 2023-11-29 02:49:07,643 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.40 vs. limit=22.5 2023-11-29 02:49:08,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3778960.0, ans=0.1 2023-11-29 02:49:09,455 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 566850 2023-11-29 02:49:15,813 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.26 vs. limit=15.0 2023-11-29 02:49:19,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=3779026.6666666665, ans=0.05 2023-11-29 02:49:22,623 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.36 vs. limit=22.5 2023-11-29 02:49:24,709 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3779026.6666666665, ans=0.0 2023-11-29 02:49:32,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3779093.3333333335, ans=0.07 2023-11-29 02:49:36,136 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.92 vs. limit=12.0 2023-11-29 02:49:41,225 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 1750, loss[loss=0.04613, simple_loss=0.06006, pruned_loss=0.006204, audio_tagging_loss=0.009896, over 14134.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08968, pruned_loss=0.01198, audio_tagging_loss=0.008847, over 3044165.08 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:50:00,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3779226.6666666665, ans=0.0 2023-11-29 02:50:11,326 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 566900 2023-11-29 02:50:16,743 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.19 vs. limit=10.0 2023-11-29 02:50:28,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3779360.0, ans=0.0 2023-11-29 02:50:29,952 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.38 vs. limit=15.0 2023-11-29 02:50:43,330 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 1800, loss[loss=0.05828, simple_loss=0.07804, pruned_loss=0.01272, audio_tagging_loss=0.006543, over 13607.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08936, pruned_loss=0.01189, audio_tagging_loss=0.008704, over 3049304.39 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:50:53,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3779493.3333333335, ans=0.125 2023-11-29 02:50:55,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3779560.0, ans=0.125 2023-11-29 02:51:01,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3779560.0, ans=0.125 2023-11-29 02:51:02,044 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.164e+01 9.130e+01 9.771e+01 1.053e+02 1.389e+02, threshold=1.954e+02, percent-clipped=0.0 2023-11-29 02:51:13,224 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 566950 2023-11-29 02:51:45,212 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 1850, loss[loss=0.07021, simple_loss=0.08408, pruned_loss=0.0182, audio_tagging_loss=0.009971, over 14166.00 frames. ], tot_loss[loss=0.06511, simple_loss=0.08919, pruned_loss=0.01184, audio_tagging_loss=0.008676, over 3054435.45 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:52:15,079 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 567000 2023-11-29 02:52:47,567 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 1900, loss[loss=0.06547, simple_loss=0.09409, pruned_loss=0.009887, audio_tagging_loss=0.008536, over 14869.00 frames. ], tot_loss[loss=0.06461, simple_loss=0.08858, pruned_loss=0.0117, audio_tagging_loss=0.008622, over 3042955.12 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:53:06,765 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.041e+01 9.033e+01 9.601e+01 1.005e+02 1.271e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-29 02:53:11,243 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.83 vs. limit=22.5 2023-11-29 02:53:17,389 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 567050 2023-11-29 02:53:18,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3780293.3333333335, ans=0.1 2023-11-29 02:53:43,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3780426.6666666665, ans=0.0 2023-11-29 02:53:48,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3780493.3333333335, ans=0.0 2023-11-29 02:53:49,027 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 1950, loss[loss=0.06874, simple_loss=0.09636, pruned_loss=0.0115, audio_tagging_loss=0.009061, over 14603.00 frames. ], tot_loss[loss=0.06425, simple_loss=0.08798, pruned_loss=0.01165, audio_tagging_loss=0.008607, over 3039383.09 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:53:54,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3780493.3333333335, ans=0.125 2023-11-29 02:54:01,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3780560.0, ans=0.125 2023-11-29 02:54:14,912 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3780626.6666666665, ans=0.0 2023-11-29 02:54:18,121 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 567100 2023-11-29 02:54:51,070 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 2000, loss[loss=0.05943, simple_loss=0.07796, pruned_loss=0.009382, audio_tagging_loss=0.01107, over 15010.00 frames. ], tot_loss[loss=0.06372, simple_loss=0.08692, pruned_loss=0.01154, audio_tagging_loss=0.008723, over 3031410.03 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:55:06,861 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.33 vs. limit=10.0 2023-11-29 02:55:10,618 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.467e+01 8.857e+01 9.495e+01 1.044e+02 1.385e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-29 02:55:20,936 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 567150 2023-11-29 02:55:22,045 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3780960.0, ans=0.125 2023-11-29 02:55:24,649 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3780960.0, ans=0.125 2023-11-29 02:55:25,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3780960.0, ans=0.2 2023-11-29 02:55:36,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=3781026.6666666665, ans=15.0 2023-11-29 02:55:37,138 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3781026.6666666665, ans=0.0 2023-11-29 02:55:40,766 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3781093.3333333335, ans=0.2 2023-11-29 02:55:52,042 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 2050, loss[loss=0.06732, simple_loss=0.09944, pruned_loss=0.01116, audio_tagging_loss=0.006441, over 14873.00 frames. ], tot_loss[loss=0.06367, simple_loss=0.087, pruned_loss=0.01147, audio_tagging_loss=0.008699, over 3032511.49 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:56:14,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3781226.6666666665, ans=0.125 2023-11-29 02:56:21,414 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 567200 2023-11-29 02:56:30,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3781360.0, ans=0.0 2023-11-29 02:56:51,614 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3781426.6666666665, ans=0.0 2023-11-29 02:56:53,773 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 2100, loss[loss=0.04115, simple_loss=0.0539, pruned_loss=0.00447, audio_tagging_loss=0.009727, over 13709.00 frames. ], tot_loss[loss=0.0643, simple_loss=0.08792, pruned_loss=0.01171, audio_tagging_loss=0.008629, over 3037457.96 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:56:59,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=3781493.3333333335, ans=15.0 2023-11-29 02:57:01,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3781493.3333333335, ans=0.2 2023-11-29 02:57:13,810 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.990e+01 9.043e+01 9.646e+01 1.030e+02 1.494e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-29 02:57:20,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3781626.6666666665, ans=0.125 2023-11-29 02:57:23,172 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 567250 2023-11-29 02:57:24,467 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3781626.6666666665, ans=0.07 2023-11-29 02:57:24,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3781626.6666666665, ans=0.125 2023-11-29 02:57:55,483 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 2150, loss[loss=0.08626, simple_loss=0.1208, pruned_loss=0.01821, audio_tagging_loss=0.007635, over 15439.00 frames. ], tot_loss[loss=0.06446, simple_loss=0.08804, pruned_loss=0.0118, audio_tagging_loss=0.008639, over 3036768.23 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:58:22,787 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3781960.0, ans=0.125 2023-11-29 02:58:25,428 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 567300 2023-11-29 02:58:32,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3782026.6666666665, ans=0.125 2023-11-29 02:58:34,720 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 02:58:46,715 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.39 vs. limit=15.0 2023-11-29 02:58:51,204 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3782093.3333333335, ans=0.125 2023-11-29 02:58:56,771 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 2200, loss[loss=0.06207, simple_loss=0.08757, pruned_loss=0.01244, audio_tagging_loss=0.005844, over 15135.00 frames. ], tot_loss[loss=0.0646, simple_loss=0.08845, pruned_loss=0.01176, audio_tagging_loss=0.008613, over 3038769.89 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:59:00,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3782160.0, ans=0.125 2023-11-29 02:59:16,876 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.554e+01 9.088e+01 9.723e+01 1.043e+02 1.467e+02, threshold=1.945e+02, percent-clipped=0.0 2023-11-29 02:59:24,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3782293.3333333335, ans=0.2 2023-11-29 02:59:26,411 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 567350 2023-11-29 02:59:32,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3782293.3333333335, ans=0.0 2023-11-29 02:59:45,688 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3782426.6666666665, ans=0.0 2023-11-29 02:59:56,722 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 02:59:58,767 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 2250, loss[loss=0.06527, simple_loss=0.09341, pruned_loss=0.0124, audio_tagging_loss=0.006161, over 15013.00 frames. ], tot_loss[loss=0.06455, simple_loss=0.08823, pruned_loss=0.01182, audio_tagging_loss=0.008618, over 3038755.82 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 03:00:04,326 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3782493.3333333335, ans=0.125 2023-11-29 03:00:04,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3782493.3333333335, ans=0.0 2023-11-29 03:00:12,636 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 03:00:23,509 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3782626.6666666665, ans=0.0 2023-11-29 03:00:25,059 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.29 vs. limit=10.0 2023-11-29 03:00:29,195 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 567400 2023-11-29 03:00:55,498 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3782760.0, ans=0.0 2023-11-29 03:01:01,001 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 2300, loss[loss=0.05625, simple_loss=0.07938, pruned_loss=0.009665, audio_tagging_loss=0.006893, over 14214.00 frames. ], tot_loss[loss=0.06464, simple_loss=0.08827, pruned_loss=0.0118, audio_tagging_loss=0.008708, over 3044399.77 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 03:01:01,524 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.22 vs. limit=15.0 2023-11-29 03:01:14,101 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3782893.3333333335, ans=0.2 2023-11-29 03:01:20,738 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.555e+01 9.062e+01 9.669e+01 1.048e+02 1.317e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-29 03:01:30,804 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 567450 2023-11-29 03:01:34,459 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3782960.0, ans=0.125 2023-11-29 03:01:44,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3783026.6666666665, ans=0.125 2023-11-29 03:01:47,251 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.29 vs. limit=12.0 2023-11-29 03:01:47,323 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.03 vs. limit=15.0 2023-11-29 03:01:58,209 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 03:02:02,752 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 2350, loss[loss=0.09046, simple_loss=0.1279, pruned_loss=0.01949, audio_tagging_loss=0.007036, over 16536.00 frames. ], tot_loss[loss=0.06477, simple_loss=0.08819, pruned_loss=0.01191, audio_tagging_loss=0.008758, over 3041625.28 frames. ], batch size: 60, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:02:16,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3783226.6666666665, ans=0.125 2023-11-29 03:02:23,694 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3783226.6666666665, ans=0.125 2023-11-29 03:02:32,374 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 567500 2023-11-29 03:02:36,138 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3783293.3333333335, ans=0.125 2023-11-29 03:03:04,486 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 2400, loss[loss=0.06639, simple_loss=0.08833, pruned_loss=0.01037, audio_tagging_loss=0.01186, over 16159.00 frames. ], tot_loss[loss=0.0649, simple_loss=0.08832, pruned_loss=0.01196, audio_tagging_loss=0.008784, over 3037773.87 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 03:03:05,206 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.86 vs. limit=15.0 2023-11-29 03:03:14,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3783493.3333333335, ans=0.0 2023-11-29 03:03:24,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3783560.0, ans=0.5 2023-11-29 03:03:27,149 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.409e+01 9.160e+01 9.857e+01 1.036e+02 1.512e+02, threshold=1.971e+02, percent-clipped=0.0 2023-11-29 03:03:34,338 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 567550 2023-11-29 03:04:05,684 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 2450, loss[loss=0.0816, simple_loss=0.1077, pruned_loss=0.01955, audio_tagging_loss=0.008217, over 15060.00 frames. ], tot_loss[loss=0.06505, simple_loss=0.08844, pruned_loss=0.01193, audio_tagging_loss=0.008898, over 3034427.86 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:04:35,609 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 567600 2023-11-29 03:04:41,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3784026.6666666665, ans=0.125 2023-11-29 03:05:08,355 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 2500, loss[loss=0.05496, simple_loss=0.07806, pruned_loss=0.006117, audio_tagging_loss=0.009817, over 14301.00 frames. ], tot_loss[loss=0.06459, simple_loss=0.08778, pruned_loss=0.0118, audio_tagging_loss=0.008895, over 3035691.48 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:05:29,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3784226.6666666665, ans=0.125 2023-11-29 03:05:30,131 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.483e+01 8.821e+01 9.659e+01 1.073e+02 1.403e+02, threshold=1.932e+02, percent-clipped=0.0 2023-11-29 03:05:37,225 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 567650 2023-11-29 03:05:37,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3784293.3333333335, ans=0.125 2023-11-29 03:05:43,898 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3784360.0, ans=0.125 2023-11-29 03:05:54,952 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3784360.0, ans=0.125 2023-11-29 03:06:01,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3784426.6666666665, ans=0.0 2023-11-29 03:06:09,256 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 2550, loss[loss=0.05679, simple_loss=0.07983, pruned_loss=0.007215, audio_tagging_loss=0.00966, over 15430.00 frames. ], tot_loss[loss=0.06509, simple_loss=0.08877, pruned_loss=0.01196, audio_tagging_loss=0.008742, over 3029523.66 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:06:18,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3784493.3333333335, ans=0.1 2023-11-29 03:06:18,649 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.25 vs. limit=15.0 2023-11-29 03:06:20,505 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3784560.0, ans=0.1 2023-11-29 03:06:29,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3784560.0, ans=0.0 2023-11-29 03:06:40,500 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 567700 2023-11-29 03:06:46,689 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.06 vs. limit=12.0 2023-11-29 03:06:52,777 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=3784693.3333333335, ans=15.0 2023-11-29 03:06:53,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3784693.3333333335, ans=0.2 2023-11-29 03:07:05,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3784760.0, ans=0.0 2023-11-29 03:07:07,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3784760.0, ans=0.0 2023-11-29 03:07:12,040 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 2600, loss[loss=0.05243, simple_loss=0.0691, pruned_loss=0.008778, audio_tagging_loss=0.009106, over 15994.00 frames. ], tot_loss[loss=0.06506, simple_loss=0.08892, pruned_loss=0.01198, audio_tagging_loss=0.008619, over 3033962.32 frames. ], batch size: 61, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:07:34,998 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.646e+01 8.651e+01 9.416e+01 1.044e+02 1.400e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-29 03:07:35,401 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3784893.3333333335, ans=0.125 2023-11-29 03:07:42,217 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 567750 2023-11-29 03:07:54,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3785026.6666666665, ans=0.2 2023-11-29 03:08:14,966 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 2650, loss[loss=0.07683, simple_loss=0.1066, pruned_loss=0.01685, audio_tagging_loss=0.006672, over 14997.00 frames. ], tot_loss[loss=0.06504, simple_loss=0.0889, pruned_loss=0.01203, audio_tagging_loss=0.008559, over 3032347.70 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:08:17,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3785160.0, ans=0.125 2023-11-29 03:08:23,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3785160.0, ans=0.0 2023-11-29 03:08:40,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3785293.3333333335, ans=0.1 2023-11-29 03:08:43,360 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 567800 2023-11-29 03:09:00,696 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.20 vs. limit=15.0 2023-11-29 03:09:15,731 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 2700, loss[loss=0.06714, simple_loss=0.08329, pruned_loss=0.01443, audio_tagging_loss=0.01106, over 16038.00 frames. ], tot_loss[loss=0.06487, simple_loss=0.0887, pruned_loss=0.01194, audio_tagging_loss=0.008578, over 3036624.59 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 8.0 2023-11-29 03:09:24,185 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.62 vs. limit=15.0 2023-11-29 03:09:26,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3785493.3333333335, ans=0.0 2023-11-29 03:09:38,845 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.286e+01 9.139e+01 9.804e+01 1.056e+02 1.449e+02, threshold=1.961e+02, percent-clipped=0.0 2023-11-29 03:09:46,626 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 567850 2023-11-29 03:09:47,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3785626.6666666665, ans=0.2 2023-11-29 03:10:17,672 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 2750, loss[loss=0.06945, simple_loss=0.1037, pruned_loss=0.0114, audio_tagging_loss=0.006197, over 15054.00 frames. ], tot_loss[loss=0.06482, simple_loss=0.08852, pruned_loss=0.01195, audio_tagging_loss=0.008615, over 3035530.42 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 8.0 2023-11-29 03:10:26,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3785826.6666666665, ans=0.95 2023-11-29 03:10:47,667 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 567900 2023-11-29 03:10:51,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3785960.0, ans=0.0 2023-11-29 03:11:01,892 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.05 vs. limit=15.0 2023-11-29 03:11:09,422 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3786093.3333333335, ans=0.125 2023-11-29 03:11:12,713 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 03:11:19,925 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 2800, loss[loss=0.07165, simple_loss=0.1046, pruned_loss=0.01234, audio_tagging_loss=0.006993, over 16010.00 frames. ], tot_loss[loss=0.06421, simple_loss=0.08784, pruned_loss=0.01173, audio_tagging_loss=0.008554, over 3039202.04 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:11:34,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3786226.6666666665, ans=0.1 2023-11-29 03:11:35,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3786226.6666666665, ans=0.0 2023-11-29 03:11:43,654 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.220e+01 9.033e+01 9.912e+01 1.050e+02 3.585e+02, threshold=1.982e+02, percent-clipped=1.0 2023-11-29 03:11:49,541 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 567950 2023-11-29 03:11:52,350 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.16 vs. limit=15.0 2023-11-29 03:11:54,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3786293.3333333335, ans=0.1 2023-11-29 03:12:06,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3786360.0, ans=0.1 2023-11-29 03:12:21,951 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 2850, loss[loss=0.06658, simple_loss=0.09617, pruned_loss=0.0117, audio_tagging_loss=0.006794, over 16171.00 frames. ], tot_loss[loss=0.06397, simple_loss=0.08755, pruned_loss=0.01171, audio_tagging_loss=0.008487, over 3040262.50 frames. ], batch size: 62, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:12:51,738 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 568000 2023-11-29 03:13:10,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3786693.3333333335, ans=0.125 2023-11-29 03:13:25,917 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 2900, loss[loss=0.06636, simple_loss=0.09863, pruned_loss=0.0086, audio_tagging_loss=0.008448, over 16262.00 frames. ], tot_loss[loss=0.06422, simple_loss=0.08795, pruned_loss=0.01178, audio_tagging_loss=0.008471, over 3043544.13 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 8.0 2023-11-29 03:13:43,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3786893.3333333335, ans=0.0 2023-11-29 03:13:51,324 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.068e+01 9.075e+01 9.599e+01 1.049e+02 1.799e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-29 03:13:56,111 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 568050 2023-11-29 03:13:58,114 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.28 vs. limit=5.0 2023-11-29 03:14:13,698 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.20 vs. limit=15.0 2023-11-29 03:14:19,493 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.05 vs. limit=6.0 2023-11-29 03:14:28,239 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 2950, loss[loss=0.06466, simple_loss=0.08393, pruned_loss=0.01199, audio_tagging_loss=0.0107, over 14128.00 frames. ], tot_loss[loss=0.06511, simple_loss=0.08923, pruned_loss=0.012, audio_tagging_loss=0.008497, over 3046085.13 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 8.0 2023-11-29 03:14:40,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3787226.6666666665, ans=0.125 2023-11-29 03:14:47,235 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.56 vs. limit=22.5 2023-11-29 03:14:57,924 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 568100 2023-11-29 03:15:06,680 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.47 vs. limit=22.5 2023-11-29 03:15:30,061 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 3000, loss[loss=0.05789, simple_loss=0.08444, pruned_loss=0.008003, audio_tagging_loss=0.007671, over 15377.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08944, pruned_loss=0.01203, audio_tagging_loss=0.008492, over 3049381.53 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 8.0 2023-11-29 03:15:30,062 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-29 03:15:45,703 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.6188, 4.3283, 2.8555, 3.9593], device='cuda:1') 2023-11-29 03:16:11,308 INFO [train_asr.py:1267] (1/4) Epoch 48, validation: loss=0.05793, simple_loss=0.05039, pruned_loss=0.005256, audio_tagging_loss=0.02748, over 4681554.00 frames. 2023-11-29 03:16:11,308 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-29 03:16:31,371 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3787560.0, ans=0.125 2023-11-29 03:16:35,796 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.417e+01 9.350e+01 9.749e+01 1.060e+02 2.355e+02, threshold=1.950e+02, percent-clipped=1.0 2023-11-29 03:16:41,310 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 568150 2023-11-29 03:16:51,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3787693.3333333335, ans=0.0 2023-11-29 03:16:56,432 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.72 vs. limit=15.0 2023-11-29 03:17:04,572 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.97 vs. limit=6.0 2023-11-29 03:17:13,318 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 3050, loss[loss=0.06663, simple_loss=0.0888, pruned_loss=0.01335, audio_tagging_loss=0.00888, over 14750.00 frames. ], tot_loss[loss=0.06504, simple_loss=0.08915, pruned_loss=0.01191, audio_tagging_loss=0.008564, over 3055244.26 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 8.0 2023-11-29 03:17:26,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3787893.3333333335, ans=0.125 2023-11-29 03:17:28,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3787893.3333333335, ans=0.0 2023-11-29 03:17:29,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3787893.3333333335, ans=0.125 2023-11-29 03:17:42,917 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 568200 2023-11-29 03:17:51,318 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 03:17:57,329 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.04 vs. limit=10.0 2023-11-29 03:17:57,527 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.23 vs. limit=6.0 2023-11-29 03:18:07,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3788093.3333333335, ans=0.0 2023-11-29 03:18:10,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3788093.3333333335, ans=0.125 2023-11-29 03:18:15,757 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 3100, loss[loss=0.07248, simple_loss=0.1027, pruned_loss=0.01376, audio_tagging_loss=0.007373, over 14951.00 frames. ], tot_loss[loss=0.06447, simple_loss=0.08811, pruned_loss=0.01177, audio_tagging_loss=0.008643, over 3051212.29 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 8.0 2023-11-29 03:18:28,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3788226.6666666665, ans=0.09899494936611666 2023-11-29 03:18:39,846 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.502e+01 8.957e+01 9.617e+01 1.028e+02 1.274e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-29 03:18:42,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3788293.3333333335, ans=0.125 2023-11-29 03:18:45,238 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 568250 2023-11-29 03:19:17,477 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 3150, loss[loss=0.07595, simple_loss=0.09903, pruned_loss=0.01711, audio_tagging_loss=0.009328, over 15168.00 frames. ], tot_loss[loss=0.06478, simple_loss=0.08834, pruned_loss=0.01184, audio_tagging_loss=0.008778, over 3048941.07 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 8.0 2023-11-29 03:19:27,175 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 03:19:32,720 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3788560.0, ans=0.125 2023-11-29 03:19:40,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3788560.0, ans=0.125 2023-11-29 03:19:41,536 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 03:19:41,789 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.91 vs. limit=15.0 2023-11-29 03:19:47,299 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 568300 2023-11-29 03:20:19,187 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 3200, loss[loss=0.06348, simple_loss=0.08113, pruned_loss=0.01256, audio_tagging_loss=0.01036, over 15747.00 frames. ], tot_loss[loss=0.06461, simple_loss=0.08798, pruned_loss=0.01177, audio_tagging_loss=0.00885, over 3039909.01 frames. ], batch size: 61, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:20:33,968 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 03:20:40,536 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3788893.3333333335, ans=0.125 2023-11-29 03:20:44,407 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.807e+01 8.999e+01 9.702e+01 1.062e+02 1.415e+02, threshold=1.940e+02, percent-clipped=0.0 2023-11-29 03:20:49,408 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 568350 2023-11-29 03:21:12,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3789093.3333333335, ans=0.125 2023-11-29 03:21:21,248 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 3250, loss[loss=0.07148, simple_loss=0.09416, pruned_loss=0.01616, audio_tagging_loss=0.008236, over 15438.00 frames. ], tot_loss[loss=0.06423, simple_loss=0.0875, pruned_loss=0.0116, audio_tagging_loss=0.008877, over 3042709.11 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:21:26,762 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.54 vs. limit=15.0 2023-11-29 03:21:40,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3789226.6666666665, ans=0.0 2023-11-29 03:21:50,762 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 568400 2023-11-29 03:21:54,477 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.25 vs. limit=22.5 2023-11-29 03:21:57,324 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3789293.3333333335, ans=0.1 2023-11-29 03:22:13,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3789426.6666666665, ans=0.125 2023-11-29 03:22:17,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3789426.6666666665, ans=0.1 2023-11-29 03:22:24,189 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 3300, loss[loss=0.05214, simple_loss=0.06637, pruned_loss=0.008038, audio_tagging_loss=0.01091, over 15313.00 frames. ], tot_loss[loss=0.06403, simple_loss=0.0872, pruned_loss=0.0115, audio_tagging_loss=0.008929, over 3035899.74 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:22:32,041 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.55 vs. limit=22.5 2023-11-29 03:22:40,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3789560.0, ans=0.125 2023-11-29 03:22:48,830 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.473e+01 9.011e+01 9.553e+01 1.025e+02 1.344e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-29 03:22:53,485 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 568450 2023-11-29 03:22:58,290 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3789626.6666666665, ans=0.0 2023-11-29 03:23:01,654 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3789693.3333333335, ans=0.0 2023-11-29 03:23:06,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3789693.3333333335, ans=0.1 2023-11-29 03:23:09,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3789693.3333333335, ans=0.125 2023-11-29 03:23:25,037 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 3350, loss[loss=0.0695, simple_loss=0.1087, pruned_loss=0.009378, audio_tagging_loss=0.005793, over 15424.00 frames. ], tot_loss[loss=0.06402, simple_loss=0.08732, pruned_loss=0.01152, audio_tagging_loss=0.00885, over 3031812.72 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:23:32,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3789826.6666666665, ans=0.125 2023-11-29 03:23:38,288 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.13 vs. limit=15.0 2023-11-29 03:23:43,782 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3789893.3333333335, ans=0.125 2023-11-29 03:23:49,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3789960.0, ans=0.0 2023-11-29 03:23:55,325 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 568500 2023-11-29 03:24:08,990 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.72 vs. limit=6.0 2023-11-29 03:24:26,774 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 3400, loss[loss=0.07114, simple_loss=0.09584, pruned_loss=0.01145, audio_tagging_loss=0.01177, over 14723.00 frames. ], tot_loss[loss=0.06398, simple_loss=0.08748, pruned_loss=0.01151, audio_tagging_loss=0.008734, over 3034901.66 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:24:27,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3790160.0, ans=0.125 2023-11-29 03:24:45,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3790226.6666666665, ans=0.0 2023-11-29 03:24:51,491 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.195e+01 9.061e+01 9.646e+01 1.021e+02 1.209e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-29 03:24:56,267 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 568550 2023-11-29 03:25:18,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3790426.6666666665, ans=0.1 2023-11-29 03:25:26,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3790426.6666666665, ans=0.125 2023-11-29 03:25:28,268 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 3450, loss[loss=0.04572, simple_loss=0.05407, pruned_loss=0.007189, audio_tagging_loss=0.0115, over 15916.00 frames. ], tot_loss[loss=0.06429, simple_loss=0.08811, pruned_loss=0.01155, audio_tagging_loss=0.008677, over 3043250.83 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:25:58,457 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 568600 2023-11-29 03:26:15,906 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3790693.3333333335, ans=0.95 2023-11-29 03:26:30,416 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 3500, loss[loss=0.0757, simple_loss=0.1091, pruned_loss=0.01627, audio_tagging_loss=0.004902, over 15383.00 frames. ], tot_loss[loss=0.06406, simple_loss=0.08781, pruned_loss=0.01163, audio_tagging_loss=0.008524, over 3039928.43 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:26:54,799 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.130e+01 8.797e+01 9.462e+01 1.015e+02 1.238e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-29 03:27:00,139 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 568650 2023-11-29 03:27:04,699 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 03:27:16,200 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3791026.6666666665, ans=0.125 2023-11-29 03:27:31,504 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3791160.0, ans=0.0 2023-11-29 03:27:32,447 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 3550, loss[loss=0.07432, simple_loss=0.1025, pruned_loss=0.01595, audio_tagging_loss=0.007101, over 14655.00 frames. ], tot_loss[loss=0.06422, simple_loss=0.08796, pruned_loss=0.01174, audio_tagging_loss=0.008504, over 3036545.68 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:27:38,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3791160.0, ans=0.125 2023-11-29 03:28:01,843 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 568700 2023-11-29 03:28:12,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3791360.0, ans=0.0 2023-11-29 03:28:34,004 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 3600, loss[loss=0.0682, simple_loss=0.08897, pruned_loss=0.01555, audio_tagging_loss=0.008166, over 14688.00 frames. ], tot_loss[loss=0.06424, simple_loss=0.08793, pruned_loss=0.01175, audio_tagging_loss=0.008526, over 3041264.55 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 03:28:42,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3791493.3333333335, ans=0.125 2023-11-29 03:28:46,040 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.80 vs. limit=22.5 2023-11-29 03:28:59,362 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.220e+01 8.871e+01 9.571e+01 1.037e+02 1.255e+02, threshold=1.914e+02, percent-clipped=0.0 2023-11-29 03:29:04,080 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 568750 2023-11-29 03:29:09,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3791693.3333333335, ans=0.1 2023-11-29 03:29:16,559 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.44 vs. limit=6.0 2023-11-29 03:29:23,703 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3791760.0, ans=0.125 2023-11-29 03:29:31,754 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3791760.0, ans=0.0 2023-11-29 03:29:35,712 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 3650, loss[loss=0.07595, simple_loss=0.1093, pruned_loss=0.01597, audio_tagging_loss=0.005335, over 14016.00 frames. ], tot_loss[loss=0.065, simple_loss=0.0889, pruned_loss=0.01209, audio_tagging_loss=0.008459, over 3039333.64 frames. ], batch size: 51, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 03:29:51,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3791893.3333333335, ans=0.09899494936611666 2023-11-29 03:29:57,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3791893.3333333335, ans=0.2 2023-11-29 03:30:02,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3791960.0, ans=0.2 2023-11-29 03:30:05,475 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 568800 2023-11-29 03:30:05,710 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3791960.0, ans=0.025 2023-11-29 03:30:10,084 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3791960.0, ans=0.1 2023-11-29 03:30:21,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3792026.6666666665, ans=0.125 2023-11-29 03:30:29,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3792093.3333333335, ans=0.125 2023-11-29 03:30:31,646 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.48 vs. limit=15.0 2023-11-29 03:30:37,615 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 3700, loss[loss=0.06353, simple_loss=0.09263, pruned_loss=0.01064, audio_tagging_loss=0.006581, over 14750.00 frames. ], tot_loss[loss=0.06503, simple_loss=0.08906, pruned_loss=0.01202, audio_tagging_loss=0.008471, over 3040543.93 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 03:30:42,411 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3792160.0, ans=0.125 2023-11-29 03:30:59,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3792226.6666666665, ans=0.125 2023-11-29 03:31:00,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3792226.6666666665, ans=0.125 2023-11-29 03:31:03,481 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.315e+01 9.169e+01 9.957e+01 1.078e+02 1.355e+02, threshold=1.991e+02, percent-clipped=0.0 2023-11-29 03:31:07,249 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 568850 2023-11-29 03:31:35,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3792426.6666666665, ans=0.2 2023-11-29 03:31:40,470 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 3750, loss[loss=0.0582, simple_loss=0.08017, pruned_loss=0.01083, audio_tagging_loss=0.007284, over 16241.00 frames. ], tot_loss[loss=0.06481, simple_loss=0.08884, pruned_loss=0.01193, audio_tagging_loss=0.008464, over 3045044.68 frames. ], batch size: 60, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:31:47,330 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3792493.3333333335, ans=0.0 2023-11-29 03:31:48,377 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3792493.3333333335, ans=0.2 2023-11-29 03:31:57,325 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.17 vs. limit=10.0 2023-11-29 03:31:58,029 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3792560.0, ans=0.125 2023-11-29 03:32:11,268 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 568900 2023-11-29 03:32:26,234 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 03:32:26,569 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3792693.3333333335, ans=0.1 2023-11-29 03:32:27,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3792693.3333333335, ans=0.0 2023-11-29 03:32:39,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3792760.0, ans=0.05 2023-11-29 03:32:40,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3792760.0, ans=0.125 2023-11-29 03:32:42,221 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 3800, loss[loss=0.07961, simple_loss=0.1147, pruned_loss=0.0148, audio_tagging_loss=0.007439, over 15306.00 frames. ], tot_loss[loss=0.06488, simple_loss=0.08908, pruned_loss=0.01188, audio_tagging_loss=0.008466, over 3051521.14 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:33:07,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3792960.0, ans=0.0 2023-11-29 03:33:08,147 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.793e+01 9.049e+01 9.763e+01 1.085e+02 1.488e+02, threshold=1.953e+02, percent-clipped=0.0 2023-11-29 03:33:11,962 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 568950 2023-11-29 03:33:13,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3792960.0, ans=0.2 2023-11-29 03:33:19,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3793026.6666666665, ans=0.1 2023-11-29 03:33:21,138 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 03:33:28,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3793026.6666666665, ans=0.125 2023-11-29 03:33:44,627 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 3850, loss[loss=0.082, simple_loss=0.1033, pruned_loss=0.01997, audio_tagging_loss=0.0104, over 14355.00 frames. ], tot_loss[loss=0.06472, simple_loss=0.08879, pruned_loss=0.01176, audio_tagging_loss=0.008568, over 3057170.44 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:33:47,073 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3793160.0, ans=0.0 2023-11-29 03:34:13,332 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 569000 2023-11-29 03:34:14,753 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3793293.3333333335, ans=0.2 2023-11-29 03:34:25,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3793360.0, ans=0.1 2023-11-29 03:34:41,160 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.55 vs. limit=15.0 2023-11-29 03:34:45,182 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 3900, loss[loss=0.0668, simple_loss=0.09298, pruned_loss=0.01437, audio_tagging_loss=0.005934, over 14207.00 frames. ], tot_loss[loss=0.06531, simple_loss=0.08965, pruned_loss=0.01195, audio_tagging_loss=0.008538, over 3059206.26 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:35:10,782 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.633e+01 9.044e+01 9.626e+01 1.053e+02 1.477e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-29 03:35:15,714 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 569050 2023-11-29 03:35:24,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3793693.3333333335, ans=0.125 2023-11-29 03:35:39,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3793760.0, ans=0.125 2023-11-29 03:35:46,734 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 3950, loss[loss=0.05611, simple_loss=0.07573, pruned_loss=0.009392, audio_tagging_loss=0.008851, over 16197.00 frames. ], tot_loss[loss=0.06523, simple_loss=0.08937, pruned_loss=0.01188, audio_tagging_loss=0.008663, over 3049831.15 frames. ], batch size: 61, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:36:06,561 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.66 vs. limit=12.0 2023-11-29 03:36:16,352 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 569100 2023-11-29 03:36:27,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3794026.6666666665, ans=0.125 2023-11-29 03:36:29,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3794026.6666666665, ans=0.0 2023-11-29 03:36:48,486 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 4000, loss[loss=0.04412, simple_loss=0.05308, pruned_loss=0.005403, audio_tagging_loss=0.01218, over 14788.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.0892, pruned_loss=0.01184, audio_tagging_loss=0.008749, over 3053868.01 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 03:37:04,004 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.29 vs. limit=15.0 2023-11-29 03:37:14,529 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.940e+01 9.124e+01 9.854e+01 1.064e+02 1.398e+02, threshold=1.971e+02, percent-clipped=0.0 2023-11-29 03:37:18,346 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 569150 2023-11-29 03:37:18,522 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3794293.3333333335, ans=0.125 2023-11-29 03:37:21,051 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3794293.3333333335, ans=0.1 2023-11-29 03:37:43,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3794426.6666666665, ans=0.0 2023-11-29 03:37:49,702 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 4050, loss[loss=0.08548, simple_loss=0.1185, pruned_loss=0.01746, audio_tagging_loss=0.008772, over 14625.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08955, pruned_loss=0.01187, audio_tagging_loss=0.008744, over 3059341.87 frames. ], batch size: 53, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 03:37:54,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3794493.3333333335, ans=0.0 2023-11-29 03:37:55,521 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 03:38:07,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3794560.0, ans=0.2 2023-11-29 03:38:10,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3794560.0, ans=0.09899494936611666 2023-11-29 03:38:14,009 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3794626.6666666665, ans=0.1 2023-11-29 03:38:19,654 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 569200 2023-11-29 03:38:25,003 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3794626.6666666665, ans=0.95 2023-11-29 03:38:26,412 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.64 vs. limit=10.0 2023-11-29 03:38:48,492 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3794760.0, ans=0.125 2023-11-29 03:38:51,653 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 4100, loss[loss=0.0622, simple_loss=0.08397, pruned_loss=0.009584, audio_tagging_loss=0.01063, over 15567.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.09003, pruned_loss=0.01193, audio_tagging_loss=0.008787, over 3057850.25 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:38:56,960 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.87 vs. limit=15.0 2023-11-29 03:39:17,391 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3794960.0, ans=0.0 2023-11-29 03:39:19,476 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.511e+01 9.014e+01 9.699e+01 1.029e+02 1.254e+02, threshold=1.940e+02, percent-clipped=0.0 2023-11-29 03:39:21,935 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 569250 2023-11-29 03:39:23,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3794960.0, ans=0.07 2023-11-29 03:39:53,510 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 4150, loss[loss=0.07373, simple_loss=0.1052, pruned_loss=0.01137, audio_tagging_loss=0.00974, over 16018.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.09027, pruned_loss=0.01189, audio_tagging_loss=0.008632, over 3052081.86 frames. ], batch size: 60, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:40:14,213 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3795226.6666666665, ans=0.125 2023-11-29 03:40:18,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3795293.3333333335, ans=0.2 2023-11-29 03:40:22,879 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 569300 2023-11-29 03:40:23,513 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.25 vs. limit=15.0 2023-11-29 03:40:31,161 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3795360.0, ans=0.1 2023-11-29 03:40:41,485 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 03:40:46,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3795426.6666666665, ans=0.1 2023-11-29 03:40:48,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3795426.6666666665, ans=0.125 2023-11-29 03:40:49,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3795426.6666666665, ans=0.0 2023-11-29 03:40:54,931 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 4200, loss[loss=0.0748, simple_loss=0.1151, pruned_loss=0.01212, audio_tagging_loss=0.005122, over 15540.00 frames. ], tot_loss[loss=0.0658, simple_loss=0.09074, pruned_loss=0.01188, audio_tagging_loss=0.008555, over 3055958.11 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:40:57,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3795493.3333333335, ans=0.1 2023-11-29 03:41:13,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3795560.0, ans=0.125 2023-11-29 03:41:13,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3795560.0, ans=0.5 2023-11-29 03:41:21,619 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.462e+01 9.132e+01 9.847e+01 1.051e+02 1.276e+02, threshold=1.969e+02, percent-clipped=0.0 2023-11-29 03:41:23,974 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 569350 2023-11-29 03:41:24,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3795626.6666666665, ans=0.1 2023-11-29 03:41:25,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3795626.6666666665, ans=0.125 2023-11-29 03:41:27,085 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3795626.6666666665, ans=0.125 2023-11-29 03:41:31,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3795693.3333333335, ans=0.125 2023-11-29 03:41:45,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3795760.0, ans=0.125 2023-11-29 03:41:46,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3795760.0, ans=0.1 2023-11-29 03:41:56,108 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 4250, loss[loss=0.07448, simple_loss=0.08967, pruned_loss=0.02045, audio_tagging_loss=0.00919, over 14499.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.09036, pruned_loss=0.01199, audio_tagging_loss=0.008489, over 3052496.73 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:41:57,853 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.76 vs. limit=10.0 2023-11-29 03:42:13,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3795893.3333333335, ans=0.125 2023-11-29 03:42:24,986 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 569400 2023-11-29 03:42:49,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3796093.3333333335, ans=0.05 2023-11-29 03:42:51,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3796093.3333333335, ans=0.0 2023-11-29 03:42:57,147 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 4300, loss[loss=0.06576, simple_loss=0.09035, pruned_loss=0.01263, audio_tagging_loss=0.007955, over 16205.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.08981, pruned_loss=0.01191, audio_tagging_loss=0.008444, over 3044410.28 frames. ], batch size: 60, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:43:07,215 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3796160.0, ans=0.125 2023-11-29 03:43:11,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3796226.6666666665, ans=0.0 2023-11-29 03:43:22,207 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3796293.3333333335, ans=0.125 2023-11-29 03:43:24,077 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.490e+01 9.187e+01 9.912e+01 1.060e+02 1.366e+02, threshold=1.982e+02, percent-clipped=0.0 2023-11-29 03:43:26,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3796293.3333333335, ans=0.125 2023-11-29 03:43:27,212 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 569450 2023-11-29 03:43:35,441 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3796360.0, ans=0.0 2023-11-29 03:43:58,129 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 4350, loss[loss=0.06998, simple_loss=0.09737, pruned_loss=0.01266, audio_tagging_loss=0.008634, over 14396.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.09018, pruned_loss=0.01211, audio_tagging_loss=0.008361, over 3047701.81 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:44:24,525 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3796626.6666666665, ans=0.025 2023-11-29 03:44:27,886 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 569500 2023-11-29 03:44:32,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=3796626.6666666665, ans=10.0 2023-11-29 03:44:33,386 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3796626.6666666665, ans=0.125 2023-11-29 03:44:34,442 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3796693.3333333335, ans=0.1 2023-11-29 03:44:39,419 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.05 vs. limit=10.0 2023-11-29 03:45:00,081 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 4400, loss[loss=0.06408, simple_loss=0.08018, pruned_loss=0.01355, audio_tagging_loss=0.01044, over 15629.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.09051, pruned_loss=0.01221, audio_tagging_loss=0.00839, over 3041763.46 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 03:45:04,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3796826.6666666665, ans=0.2 2023-11-29 03:45:26,456 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.800e+01 8.869e+01 9.573e+01 1.013e+02 1.408e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-29 03:45:27,177 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.05 vs. limit=15.0 2023-11-29 03:45:28,945 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 569550 2023-11-29 03:45:35,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3797026.6666666665, ans=0.1 2023-11-29 03:45:36,263 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3797026.6666666665, ans=0.125 2023-11-29 03:45:47,914 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3797093.3333333335, ans=0.125 2023-11-29 03:46:00,732 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 4450, loss[loss=0.0652, simple_loss=0.09531, pruned_loss=0.01111, audio_tagging_loss=0.006439, over 16755.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08976, pruned_loss=0.01196, audio_tagging_loss=0.008425, over 3049106.79 frames. ], batch size: 61, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 03:46:24,230 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.37 vs. limit=15.0 2023-11-29 03:46:30,127 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 569600 2023-11-29 03:46:51,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3797426.6666666665, ans=0.125 2023-11-29 03:46:56,413 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 03:47:02,127 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 4500, loss[loss=0.05341, simple_loss=0.07351, pruned_loss=0.00918, audio_tagging_loss=0.007472, over 14080.00 frames. ], tot_loss[loss=0.06485, simple_loss=0.08917, pruned_loss=0.01186, audio_tagging_loss=0.008404, over 3043427.12 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 03:47:08,218 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3797493.3333333335, ans=0.125 2023-11-29 03:47:22,423 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3797560.0, ans=0.125 2023-11-29 03:47:26,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3797626.6666666665, ans=0.1 2023-11-29 03:47:29,051 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.852e+01 8.934e+01 9.508e+01 1.012e+02 1.257e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-29 03:47:31,576 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 569650 2023-11-29 03:47:50,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3797760.0, ans=0.0 2023-11-29 03:48:02,618 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 4550, loss[loss=0.04068, simple_loss=0.04818, pruned_loss=0.006522, audio_tagging_loss=0.01007, over 14284.00 frames. ], tot_loss[loss=0.06462, simple_loss=0.08889, pruned_loss=0.01179, audio_tagging_loss=0.008387, over 3039241.70 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 03:48:14,529 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3797893.3333333335, ans=0.125 2023-11-29 03:48:32,903 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 569700 2023-11-29 03:48:34,246 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 03:48:36,726 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3797960.0, ans=0.05 2023-11-29 03:48:36,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3797960.0, ans=0.1 2023-11-29 03:48:46,459 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.88 vs. limit=15.0 2023-11-29 03:48:47,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3798026.6666666665, ans=0.0 2023-11-29 03:48:53,587 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 03:49:04,182 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 4600, loss[loss=0.05914, simple_loss=0.07464, pruned_loss=0.01068, audio_tagging_loss=0.01114, over 13602.00 frames. ], tot_loss[loss=0.06434, simple_loss=0.0881, pruned_loss=0.0118, audio_tagging_loss=0.008494, over 3038633.86 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 03:49:09,017 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3798160.0, ans=0.125 2023-11-29 03:49:14,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=3798160.0, ans=15.0 2023-11-29 03:49:21,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3798226.6666666665, ans=0.125 2023-11-29 03:49:28,994 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3798293.3333333335, ans=0.125 2023-11-29 03:49:30,979 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.879e+01 8.831e+01 9.354e+01 1.006e+02 1.240e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-29 03:49:33,416 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 569750 2023-11-29 03:49:37,999 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.51 vs. limit=12.0 2023-11-29 03:50:05,632 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 4650, loss[loss=0.06634, simple_loss=0.09231, pruned_loss=0.01211, audio_tagging_loss=0.00807, over 16321.00 frames. ], tot_loss[loss=0.06437, simple_loss=0.08789, pruned_loss=0.01192, audio_tagging_loss=0.008509, over 3037390.78 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 03:50:17,458 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3798560.0, ans=0.1 2023-11-29 03:50:29,393 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3798626.6666666665, ans=0.125 2023-11-29 03:50:30,908 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.34 vs. limit=15.0 2023-11-29 03:50:34,949 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 569800 2023-11-29 03:50:49,395 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.55 vs. limit=15.0 2023-11-29 03:50:51,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3798693.3333333335, ans=0.1 2023-11-29 03:50:57,604 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.14 vs. limit=6.0 2023-11-29 03:51:05,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3798826.6666666665, ans=0.0 2023-11-29 03:51:06,092 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 4700, loss[loss=0.06044, simple_loss=0.07268, pruned_loss=0.01118, audio_tagging_loss=0.01293, over 14430.00 frames. ], tot_loss[loss=0.06411, simple_loss=0.08727, pruned_loss=0.01185, audio_tagging_loss=0.008626, over 3031680.87 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 03:51:21,787 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 03:51:33,853 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.280e+01 9.105e+01 9.941e+01 1.052e+02 1.267e+02, threshold=1.988e+02, percent-clipped=0.0 2023-11-29 03:51:36,253 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 569850 2023-11-29 03:51:40,422 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.51 vs. limit=15.0 2023-11-29 03:51:41,527 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.59 vs. limit=6.0 2023-11-29 03:51:48,401 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3799026.6666666665, ans=0.1 2023-11-29 03:51:50,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3799026.6666666665, ans=0.125 2023-11-29 03:51:50,689 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 03:52:08,215 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 4750, loss[loss=0.04806, simple_loss=0.07199, pruned_loss=0.003903, audio_tagging_loss=0.00816, over 15260.00 frames. ], tot_loss[loss=0.06475, simple_loss=0.08813, pruned_loss=0.01208, audio_tagging_loss=0.008614, over 3033594.03 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 03:52:15,365 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3799160.0, ans=0.2 2023-11-29 03:52:19,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3799226.6666666665, ans=0.0 2023-11-29 03:52:20,583 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3799226.6666666665, ans=0.0 2023-11-29 03:52:31,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3799293.3333333335, ans=0.1 2023-11-29 03:52:35,610 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3799293.3333333335, ans=0.125 2023-11-29 03:52:36,723 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 569900 2023-11-29 03:52:42,730 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3799360.0, ans=0.125 2023-11-29 03:52:51,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3799360.0, ans=0.125 2023-11-29 03:52:59,913 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3799426.6666666665, ans=0.125 2023-11-29 03:53:09,925 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 4800, loss[loss=0.06611, simple_loss=0.09393, pruned_loss=0.01184, audio_tagging_loss=0.00731, over 15370.00 frames. ], tot_loss[loss=0.06477, simple_loss=0.0883, pruned_loss=0.0119, audio_tagging_loss=0.008722, over 3040308.98 frames. ], batch size: 60, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 03:53:12,654 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3799493.3333333335, ans=0.125 2023-11-29 03:53:33,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3799626.6666666665, ans=0.0 2023-11-29 03:53:37,269 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.583e+01 8.976e+01 9.656e+01 1.035e+02 1.213e+02, threshold=1.931e+02, percent-clipped=0.0 2023-11-29 03:53:38,641 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 569950 2023-11-29 03:54:03,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3799760.0, ans=0.0 2023-11-29 03:54:06,374 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.69 vs. limit=15.0 2023-11-29 03:54:11,304 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 4850, loss[loss=0.06838, simple_loss=0.09656, pruned_loss=0.01276, audio_tagging_loss=0.007347, over 15977.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.08856, pruned_loss=0.01197, audio_tagging_loss=0.008835, over 3037257.70 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:54:21,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3799826.6666666665, ans=0.1 2023-11-29 03:54:25,655 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.68 vs. limit=6.0 2023-11-29 03:54:39,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3799960.0, ans=0.2 2023-11-29 03:54:42,085 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 570000 2023-11-29 03:55:09,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3800093.3333333335, ans=0.1 2023-11-29 03:55:10,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3800093.3333333335, ans=0.2 2023-11-29 03:55:13,231 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 4900, loss[loss=0.04458, simple_loss=0.05921, pruned_loss=0.004837, audio_tagging_loss=0.01014, over 15984.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.08866, pruned_loss=0.01191, audio_tagging_loss=0.008908, over 3046153.26 frames. ], batch size: 60, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:55:17,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3800160.0, ans=0.125 2023-11-29 03:55:43,181 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.742e+01 8.986e+01 9.622e+01 1.028e+02 1.398e+02, threshold=1.924e+02, percent-clipped=0.0 2023-11-29 03:55:44,506 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 570050 2023-11-29 03:55:50,730 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 03:55:51,838 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3800360.0, ans=0.0 2023-11-29 03:55:58,114 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3800360.0, ans=0.125 2023-11-29 03:56:18,013 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 4950, loss[loss=0.04603, simple_loss=0.06126, pruned_loss=0.005663, audio_tagging_loss=0.009737, over 14247.00 frames. ], tot_loss[loss=0.06459, simple_loss=0.08821, pruned_loss=0.01171, audio_tagging_loss=0.008782, over 3036075.66 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:56:29,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3800560.0, ans=0.025 2023-11-29 03:56:37,324 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3800560.0, ans=0.125 2023-11-29 03:56:47,172 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 570100 2023-11-29 03:57:03,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3800693.3333333335, ans=0.125 2023-11-29 03:57:09,268 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.91 vs. limit=15.0 2023-11-29 03:57:15,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3800760.0, ans=0.1 2023-11-29 03:57:19,423 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 5000, loss[loss=0.05147, simple_loss=0.07008, pruned_loss=0.01057, audio_tagging_loss=0.005864, over 15265.00 frames. ], tot_loss[loss=0.06445, simple_loss=0.08842, pruned_loss=0.01169, audio_tagging_loss=0.008558, over 3037660.79 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:57:22,113 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3800826.6666666665, ans=0.09899494936611666 2023-11-29 03:57:23,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3800826.6666666665, ans=0.125 2023-11-29 03:57:32,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3800893.3333333335, ans=0.2 2023-11-29 03:57:37,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=3800893.3333333335, ans=0.05 2023-11-29 03:57:43,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3800960.0, ans=0.125 2023-11-29 03:57:47,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3800960.0, ans=0.125 2023-11-29 03:57:48,755 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.151e+01 8.990e+01 9.495e+01 1.030e+02 1.330e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-29 03:57:50,043 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 570150 2023-11-29 03:57:50,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3800960.0, ans=0.125 2023-11-29 03:57:55,289 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.78 vs. limit=22.5 2023-11-29 03:58:01,929 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3801026.6666666665, ans=0.125 2023-11-29 03:58:04,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3801026.6666666665, ans=0.125 2023-11-29 03:58:21,208 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 5050, loss[loss=0.05741, simple_loss=0.07513, pruned_loss=0.009946, audio_tagging_loss=0.009896, over 14728.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08961, pruned_loss=0.01199, audio_tagging_loss=0.008503, over 3035639.97 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:58:21,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3801160.0, ans=0.0 2023-11-29 03:58:41,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3801226.6666666665, ans=0.125 2023-11-29 03:58:50,472 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 570200 2023-11-29 03:58:51,077 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.26 vs. limit=15.0 2023-11-29 03:58:51,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=3801293.3333333335, ans=0.02 2023-11-29 03:58:51,692 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3801293.3333333335, ans=10.0 2023-11-29 03:59:03,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3801360.0, ans=0.125 2023-11-29 03:59:13,944 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.17 vs. limit=15.0 2023-11-29 03:59:20,086 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3801426.6666666665, ans=0.125 2023-11-29 03:59:22,909 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 5100, loss[loss=0.07892, simple_loss=0.1127, pruned_loss=0.01758, audio_tagging_loss=0.004984, over 14314.00 frames. ], tot_loss[loss=0.06464, simple_loss=0.08861, pruned_loss=0.01185, audio_tagging_loss=0.008489, over 3032509.04 frames. ], batch size: 53, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:59:29,312 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.86 vs. limit=22.5 2023-11-29 03:59:33,848 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3801560.0, ans=0.1 2023-11-29 03:59:50,405 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.137e+01 9.192e+01 9.667e+01 1.067e+02 2.138e+02, threshold=1.933e+02, percent-clipped=1.0 2023-11-29 03:59:51,779 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 570250 2023-11-29 03:59:57,124 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.19 vs. limit=10.0 2023-11-29 04:00:01,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3801693.3333333335, ans=0.125 2023-11-29 04:00:08,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3801693.3333333335, ans=0.125 2023-11-29 04:00:11,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3801760.0, ans=0.0 2023-11-29 04:00:13,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3801760.0, ans=0.125 2023-11-29 04:00:23,324 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3801826.6666666665, ans=0.125 2023-11-29 04:00:24,154 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 5150, loss[loss=0.05948, simple_loss=0.08153, pruned_loss=0.00765, audio_tagging_loss=0.01107, over 16207.00 frames. ], tot_loss[loss=0.06475, simple_loss=0.08907, pruned_loss=0.01184, audio_tagging_loss=0.008374, over 3036178.67 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 04:00:27,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3801826.6666666665, ans=0.0 2023-11-29 04:00:36,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3801893.3333333335, ans=0.125 2023-11-29 04:00:50,878 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.48 vs. limit=10.0 2023-11-29 04:00:53,312 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 570300 2023-11-29 04:00:58,403 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.80 vs. limit=22.5 2023-11-29 04:01:07,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3802026.6666666665, ans=0.125 2023-11-29 04:01:18,286 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3802093.3333333335, ans=0.0 2023-11-29 04:01:24,641 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.46 vs. limit=15.0 2023-11-29 04:01:25,314 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 5200, loss[loss=0.09436, simple_loss=0.1386, pruned_loss=0.01938, audio_tagging_loss=0.005707, over 15012.00 frames. ], tot_loss[loss=0.0651, simple_loss=0.08982, pruned_loss=0.01188, audio_tagging_loss=0.008305, over 3039215.55 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 04:01:37,804 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.06 vs. limit=10.0 2023-11-29 04:01:42,268 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.63 vs. limit=22.5 2023-11-29 04:01:43,581 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.15 vs. limit=15.0 2023-11-29 04:01:54,019 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.656e+01 9.016e+01 9.699e+01 1.038e+02 1.418e+02, threshold=1.940e+02, percent-clipped=0.0 2023-11-29 04:01:55,305 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 570350 2023-11-29 04:02:13,080 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.87 vs. limit=15.0 2023-11-29 04:02:13,991 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.12 vs. limit=12.0 2023-11-29 04:02:19,897 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2023-11-29 04:02:26,909 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 5250, loss[loss=0.06242, simple_loss=0.08917, pruned_loss=0.0112, audio_tagging_loss=0.006635, over 14991.00 frames. ], tot_loss[loss=0.06413, simple_loss=0.08829, pruned_loss=0.01161, audio_tagging_loss=0.008376, over 3038870.36 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 04:02:36,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3802493.3333333335, ans=0.0 2023-11-29 04:02:45,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3802560.0, ans=0.1 2023-11-29 04:02:48,377 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.00 vs. limit=22.5 2023-11-29 04:02:56,725 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 570400 2023-11-29 04:03:17,021 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3802760.0, ans=0.125 2023-11-29 04:03:28,968 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 5300, loss[loss=0.07394, simple_loss=0.09881, pruned_loss=0.01597, audio_tagging_loss=0.00857, over 15411.00 frames. ], tot_loss[loss=0.06466, simple_loss=0.08861, pruned_loss=0.01187, audio_tagging_loss=0.008494, over 3035637.90 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 04:03:30,377 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3802826.6666666665, ans=0.2 2023-11-29 04:03:32,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3802826.6666666665, ans=0.125 2023-11-29 04:03:38,866 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3802826.6666666665, ans=0.0 2023-11-29 04:03:45,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3802893.3333333335, ans=0.125 2023-11-29 04:03:52,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3802960.0, ans=0.2 2023-11-29 04:03:53,766 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.70 vs. limit=15.0 2023-11-29 04:03:57,734 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.209e+01 9.143e+01 9.635e+01 1.038e+02 1.334e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-29 04:03:57,892 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 570450 2023-11-29 04:04:14,065 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.80 vs. limit=10.0 2023-11-29 04:04:16,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3803026.6666666665, ans=0.125 2023-11-29 04:04:16,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3803026.6666666665, ans=0.0 2023-11-29 04:04:27,904 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.81 vs. limit=15.0 2023-11-29 04:04:29,664 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 5350, loss[loss=0.06878, simple_loss=0.09337, pruned_loss=0.01429, audio_tagging_loss=0.007808, over 14841.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.08985, pruned_loss=0.0121, audio_tagging_loss=0.008449, over 3041164.34 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 04:04:36,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3803160.0, ans=0.1 2023-11-29 04:04:47,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3803226.6666666665, ans=0.125 2023-11-29 04:04:57,912 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3803293.3333333335, ans=0.125 2023-11-29 04:04:58,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=3803293.3333333335, ans=15.0 2023-11-29 04:05:00,561 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 570500 2023-11-29 04:05:18,345 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.59 vs. limit=15.0 2023-11-29 04:05:25,150 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.24 vs. limit=15.0 2023-11-29 04:05:31,590 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 5400, loss[loss=0.0797, simple_loss=0.1196, pruned_loss=0.01565, audio_tagging_loss=0.004236, over 15712.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08986, pruned_loss=0.01209, audio_tagging_loss=0.008515, over 3044595.05 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 04:05:31,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3803493.3333333335, ans=0.1 2023-11-29 04:05:33,386 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.14 vs. limit=15.0 2023-11-29 04:06:01,167 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.249e+01 9.108e+01 9.705e+01 1.034e+02 1.334e+02, threshold=1.941e+02, percent-clipped=0.0 2023-11-29 04:06:01,322 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 570550 2023-11-29 04:06:09,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3803693.3333333335, ans=0.125 2023-11-29 04:06:33,409 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 5450, loss[loss=0.07686, simple_loss=0.1109, pruned_loss=0.01425, audio_tagging_loss=0.00718, over 16276.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.08984, pruned_loss=0.01215, audio_tagging_loss=0.008543, over 3053333.10 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 04:06:43,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3803826.6666666665, ans=0.0 2023-11-29 04:07:02,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3803960.0, ans=0.1 2023-11-29 04:07:03,396 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 570600 2023-11-29 04:07:06,695 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.68 vs. limit=15.0 2023-11-29 04:07:35,615 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 5500, loss[loss=0.04971, simple_loss=0.06825, pruned_loss=0.005518, audio_tagging_loss=0.01006, over 15751.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.08946, pruned_loss=0.01198, audio_tagging_loss=0.008621, over 3058310.61 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 04:08:01,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3804293.3333333335, ans=0.125 2023-11-29 04:08:05,682 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 570650 2023-11-29 04:08:06,699 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.860e+01 9.091e+01 9.676e+01 1.052e+02 2.081e+02, threshold=1.935e+02, percent-clipped=1.0 2023-11-29 04:08:30,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3804426.6666666665, ans=0.125 2023-11-29 04:08:34,280 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.83 vs. limit=15.0 2023-11-29 04:08:37,450 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 5550, loss[loss=0.06465, simple_loss=0.08045, pruned_loss=0.01615, audio_tagging_loss=0.008269, over 14380.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.0897, pruned_loss=0.01202, audio_tagging_loss=0.008714, over 3062380.25 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 8.0 2023-11-29 04:08:52,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3804560.0, ans=0.125 2023-11-29 04:09:07,026 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 570700 2023-11-29 04:09:11,279 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.61 vs. limit=10.0 2023-11-29 04:09:39,312 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 5600, loss[loss=0.09023, simple_loss=0.1247, pruned_loss=0.0217, audio_tagging_loss=0.006184, over 16038.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08978, pruned_loss=0.01198, audio_tagging_loss=0.008712, over 3053205.26 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 04:09:39,640 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3804826.6666666665, ans=0.2 2023-11-29 04:09:40,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3804826.6666666665, ans=0.0 2023-11-29 04:09:43,202 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.41 vs. limit=15.0 2023-11-29 04:09:48,958 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3804826.6666666665, ans=0.125 2023-11-29 04:10:07,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3804960.0, ans=0.125 2023-11-29 04:10:08,927 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 570750 2023-11-29 04:10:09,957 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.666e+01 9.028e+01 9.748e+01 1.040e+02 1.265e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-29 04:10:26,166 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 04:10:32,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3805093.3333333335, ans=0.125 2023-11-29 04:10:33,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3805093.3333333335, ans=0.0 2023-11-29 04:10:35,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3805093.3333333335, ans=0.1 2023-11-29 04:10:37,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3805093.3333333335, ans=0.125 2023-11-29 04:10:40,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3805160.0, ans=0.125 2023-11-29 04:10:40,934 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 5650, loss[loss=0.0656, simple_loss=0.08274, pruned_loss=0.01631, audio_tagging_loss=0.007926, over 14024.00 frames. ], tot_loss[loss=0.06468, simple_loss=0.08812, pruned_loss=0.01177, audio_tagging_loss=0.008855, over 3052452.43 frames. ], batch size: 53, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 04:10:53,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3805226.6666666665, ans=0.125 2023-11-29 04:11:07,612 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.95 vs. limit=22.5 2023-11-29 04:11:08,592 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3805293.3333333335, ans=0.1 2023-11-29 04:11:10,757 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 570800 2023-11-29 04:11:18,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3805360.0, ans=0.125 2023-11-29 04:11:26,335 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 04:11:42,456 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 5700, loss[loss=0.07858, simple_loss=0.1109, pruned_loss=0.01568, audio_tagging_loss=0.007448, over 16141.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.08891, pruned_loss=0.01191, audio_tagging_loss=0.008801, over 3058892.50 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 04:11:46,383 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3805493.3333333335, ans=0.0 2023-11-29 04:12:02,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3805560.0, ans=0.0 2023-11-29 04:12:11,045 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3805626.6666666665, ans=0.125 2023-11-29 04:12:11,953 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 570850 2023-11-29 04:12:13,077 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.783e+01 9.102e+01 9.721e+01 1.096e+02 1.374e+02, threshold=1.944e+02, percent-clipped=0.0 2023-11-29 04:12:13,809 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.22 vs. limit=10.0 2023-11-29 04:12:15,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3805626.6666666665, ans=0.125 2023-11-29 04:12:21,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3805693.3333333335, ans=0.125 2023-11-29 04:12:42,349 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3805760.0, ans=0.0 2023-11-29 04:12:44,429 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 5750, loss[loss=0.0586, simple_loss=0.07207, pruned_loss=0.00951, audio_tagging_loss=0.01305, over 14644.00 frames. ], tot_loss[loss=0.06448, simple_loss=0.08794, pruned_loss=0.01178, audio_tagging_loss=0.008731, over 3054747.30 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 04:12:50,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3805826.6666666665, ans=0.2 2023-11-29 04:12:54,009 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3805826.6666666665, ans=0.0 2023-11-29 04:12:56,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3805893.3333333335, ans=0.05 2023-11-29 04:13:13,090 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 570900 2023-11-29 04:13:16,248 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.84 vs. limit=6.0 2023-11-29 04:13:38,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3806093.3333333335, ans=0.125 2023-11-29 04:13:44,372 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 5800, loss[loss=0.04113, simple_loss=0.04897, pruned_loss=0.005955, audio_tagging_loss=0.01069, over 16248.00 frames. ], tot_loss[loss=0.06446, simple_loss=0.0881, pruned_loss=0.01177, audio_tagging_loss=0.008637, over 3051733.12 frames. ], batch size: 63, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 04:13:45,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3806160.0, ans=0.1 2023-11-29 04:13:47,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3806160.0, ans=0.1 2023-11-29 04:13:49,536 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3806160.0, ans=0.125 2023-11-29 04:13:50,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3806160.0, ans=0.0 2023-11-29 04:13:58,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3806226.6666666665, ans=0.0 2023-11-29 04:14:14,913 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 570950 2023-11-29 04:14:15,917 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.648e+01 8.950e+01 9.520e+01 1.017e+02 1.550e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-29 04:14:22,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3806360.0, ans=0.125 2023-11-29 04:14:25,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3806360.0, ans=0.125 2023-11-29 04:14:35,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3806426.6666666665, ans=0.0 2023-11-29 04:14:40,228 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3806426.6666666665, ans=0.125 2023-11-29 04:14:41,603 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 04:14:44,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3806426.6666666665, ans=0.0 2023-11-29 04:14:45,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3806493.3333333335, ans=0.1 2023-11-29 04:14:45,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3806493.3333333335, ans=0.025 2023-11-29 04:14:46,533 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 5850, loss[loss=0.05686, simple_loss=0.08502, pruned_loss=0.007561, audio_tagging_loss=0.006789, over 14526.00 frames. ], tot_loss[loss=0.06464, simple_loss=0.08852, pruned_loss=0.01177, audio_tagging_loss=0.008608, over 3045175.54 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 04:14:48,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3806493.3333333335, ans=0.2 2023-11-29 04:15:15,910 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 571000 2023-11-29 04:15:16,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3806626.6666666665, ans=0.1 2023-11-29 04:15:23,429 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3806693.3333333335, ans=0.2 2023-11-29 04:15:26,290 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.78 vs. limit=15.0 2023-11-29 04:15:30,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3806693.3333333335, ans=0.125 2023-11-29 04:15:33,607 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.62 vs. limit=15.0 2023-11-29 04:15:35,339 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 04:15:49,162 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 5900, loss[loss=0.06423, simple_loss=0.08761, pruned_loss=0.01191, audio_tagging_loss=0.008511, over 14959.00 frames. ], tot_loss[loss=0.06455, simple_loss=0.0884, pruned_loss=0.01178, audio_tagging_loss=0.008573, over 3039641.99 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 04:16:10,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3806893.3333333335, ans=0.0 2023-11-29 04:16:10,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3806893.3333333335, ans=0.125 2023-11-29 04:16:16,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3806960.0, ans=0.125 2023-11-29 04:16:17,738 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 571050 2023-11-29 04:16:18,796 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.746e+01 9.359e+01 9.876e+01 1.067e+02 1.252e+02, threshold=1.975e+02, percent-clipped=0.0 2023-11-29 04:16:26,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3807026.6666666665, ans=0.1 2023-11-29 04:16:30,809 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.12 vs. limit=15.0 2023-11-29 04:16:45,674 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3807093.3333333335, ans=0.0 2023-11-29 04:16:47,188 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.13 vs. limit=15.0 2023-11-29 04:16:50,136 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 5950, loss[loss=0.0434, simple_loss=0.04877, pruned_loss=0.004871, audio_tagging_loss=0.01414, over 14579.00 frames. ], tot_loss[loss=0.0646, simple_loss=0.08817, pruned_loss=0.01186, audio_tagging_loss=0.00865, over 3041216.76 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 04:17:12,840 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.60 vs. limit=15.0 2023-11-29 04:17:19,991 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 571100 2023-11-29 04:17:42,809 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3807426.6666666665, ans=0.2 2023-11-29 04:17:51,347 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 6000, loss[loss=0.0501, simple_loss=0.06385, pruned_loss=0.005991, audio_tagging_loss=0.01218, over 16258.00 frames. ], tot_loss[loss=0.06404, simple_loss=0.0874, pruned_loss=0.0117, audio_tagging_loss=0.008633, over 3034840.53 frames. ], batch size: 64, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 04:17:51,348 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-29 04:18:31,357 INFO [train_asr.py:1267] (1/4) Epoch 48, validation: loss=0.05827, simple_loss=0.05042, pruned_loss=0.005313, audio_tagging_loss=0.02774, over 4681554.00 frames. 2023-11-29 04:18:31,358 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-29 04:18:31,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3807493.3333333335, ans=0.04949747468305833 2023-11-29 04:18:42,212 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3807560.0, ans=0.125 2023-11-29 04:18:42,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3807560.0, ans=0.0 2023-11-29 04:18:49,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3807560.0, ans=0.125 2023-11-29 04:19:00,373 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 571150 2023-11-29 04:19:01,357 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.457e+01 8.999e+01 9.693e+01 1.031e+02 2.165e+02, threshold=1.939e+02, percent-clipped=1.0 2023-11-29 04:19:12,782 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3807693.3333333335, ans=0.2 2023-11-29 04:19:19,652 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 04:19:32,555 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 6050, loss[loss=0.07217, simple_loss=0.09462, pruned_loss=0.01528, audio_tagging_loss=0.009576, over 15564.00 frames. ], tot_loss[loss=0.06417, simple_loss=0.08763, pruned_loss=0.01175, audio_tagging_loss=0.008601, over 3035702.50 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 04:19:40,441 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3807826.6666666665, ans=0.2 2023-11-29 04:19:54,775 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3807893.3333333335, ans=0.125 2023-11-29 04:20:02,424 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 571200 2023-11-29 04:20:31,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3808093.3333333335, ans=0.0 2023-11-29 04:20:31,866 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3808093.3333333335, ans=0.1 2023-11-29 04:20:34,985 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 6100, loss[loss=0.08382, simple_loss=0.1193, pruned_loss=0.01543, audio_tagging_loss=0.00874, over 15825.00 frames. ], tot_loss[loss=0.06445, simple_loss=0.08819, pruned_loss=0.01173, audio_tagging_loss=0.008627, over 3038631.06 frames. ], batch size: 60, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 04:20:58,098 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.57 vs. limit=15.0 2023-11-29 04:20:59,221 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.75 vs. limit=15.0 2023-11-29 04:20:59,289 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.76 vs. limit=10.0 2023-11-29 04:21:04,582 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3808293.3333333335, ans=0.125 2023-11-29 04:21:05,504 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 571250 2023-11-29 04:21:07,702 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.756e+01 8.969e+01 9.609e+01 1.049e+02 1.338e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-29 04:21:31,789 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3808426.6666666665, ans=0.125 2023-11-29 04:21:37,905 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 6150, loss[loss=0.0681, simple_loss=0.1, pruned_loss=0.01049, audio_tagging_loss=0.007599, over 14933.00 frames. ], tot_loss[loss=0.06483, simple_loss=0.08892, pruned_loss=0.01181, audio_tagging_loss=0.008561, over 3039136.79 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 04:22:02,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3808626.6666666665, ans=0.125 2023-11-29 04:22:05,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3808626.6666666665, ans=0.1 2023-11-29 04:22:07,234 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 571300 2023-11-29 04:22:15,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3808693.3333333335, ans=0.04949747468305833 2023-11-29 04:22:18,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3808693.3333333335, ans=0.125 2023-11-29 04:22:26,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3808760.0, ans=10.0 2023-11-29 04:22:38,909 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 6200, loss[loss=0.05968, simple_loss=0.07979, pruned_loss=0.0112, audio_tagging_loss=0.008578, over 14617.00 frames. ], tot_loss[loss=0.06475, simple_loss=0.08827, pruned_loss=0.01188, audio_tagging_loss=0.00873, over 3037984.30 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:22:45,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3808826.6666666665, ans=0.09899494936611666 2023-11-29 04:22:56,371 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3808893.3333333335, ans=0.09899494936611666 2023-11-29 04:22:56,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3808893.3333333335, ans=0.1 2023-11-29 04:23:08,418 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 571350 2023-11-29 04:23:10,641 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.802e+01 8.947e+01 9.565e+01 1.046e+02 1.413e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-29 04:23:15,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3809026.6666666665, ans=0.125 2023-11-29 04:23:40,229 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 6250, loss[loss=0.08076, simple_loss=0.116, pruned_loss=0.01518, audio_tagging_loss=0.007589, over 15605.00 frames. ], tot_loss[loss=0.06461, simple_loss=0.08806, pruned_loss=0.01181, audio_tagging_loss=0.008768, over 3040555.01 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:23:47,708 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3809160.0, ans=0.125 2023-11-29 04:24:10,232 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 571400 2023-11-29 04:24:15,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3809293.3333333335, ans=0.0 2023-11-29 04:24:32,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3809426.6666666665, ans=0.125 2023-11-29 04:24:39,806 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.91 vs. limit=15.0 2023-11-29 04:24:41,950 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 6300, loss[loss=0.06504, simple_loss=0.07341, pruned_loss=0.01594, audio_tagging_loss=0.0124, over 14480.00 frames. ], tot_loss[loss=0.06503, simple_loss=0.0886, pruned_loss=0.01188, audio_tagging_loss=0.008853, over 3042663.27 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:24:47,700 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.73 vs. limit=15.0 2023-11-29 04:24:56,201 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.56 vs. limit=15.0 2023-11-29 04:25:05,889 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.99 vs. limit=15.0 2023-11-29 04:25:10,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3809626.6666666665, ans=0.125 2023-11-29 04:25:11,580 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 571450 2023-11-29 04:25:13,816 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.701e+01 9.159e+01 9.734e+01 1.043e+02 1.366e+02, threshold=1.947e+02, percent-clipped=0.0 2023-11-29 04:25:26,094 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.51 vs. limit=6.0 2023-11-29 04:25:43,855 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 6350, loss[loss=0.1004, simple_loss=0.1334, pruned_loss=0.02639, audio_tagging_loss=0.007327, over 15587.00 frames. ], tot_loss[loss=0.06515, simple_loss=0.08865, pruned_loss=0.01195, audio_tagging_loss=0.008877, over 3044344.97 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:26:12,669 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 571500 2023-11-29 04:26:16,092 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.88 vs. limit=15.0 2023-11-29 04:26:24,505 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.31 vs. limit=6.0 2023-11-29 04:26:33,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3810093.3333333335, ans=0.025 2023-11-29 04:26:45,491 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 6400, loss[loss=0.05092, simple_loss=0.05859, pruned_loss=0.01011, audio_tagging_loss=0.01152, over 14682.00 frames. ], tot_loss[loss=0.06496, simple_loss=0.08842, pruned_loss=0.01184, audio_tagging_loss=0.008903, over 3037836.79 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 04:27:14,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3810293.3333333335, ans=0.125 2023-11-29 04:27:15,301 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 571550 2023-11-29 04:27:17,526 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.453e+01 8.796e+01 9.535e+01 1.038e+02 1.501e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-29 04:27:21,504 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3810360.0, ans=0.2 2023-11-29 04:27:36,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3810426.6666666665, ans=0.1 2023-11-29 04:27:46,663 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 6450, loss[loss=0.06191, simple_loss=0.08187, pruned_loss=0.01197, audio_tagging_loss=0.009006, over 15914.00 frames. ], tot_loss[loss=0.06507, simple_loss=0.08872, pruned_loss=0.01186, audio_tagging_loss=0.008853, over 3037076.26 frames. ], batch size: 61, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 04:27:49,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3810493.3333333335, ans=0.125 2023-11-29 04:28:01,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3810560.0, ans=0.09899494936611666 2023-11-29 04:28:08,483 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3810560.0, ans=0.125 2023-11-29 04:28:16,570 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 571600 2023-11-29 04:28:49,437 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 6500, loss[loss=0.06702, simple_loss=0.09241, pruned_loss=0.01373, audio_tagging_loss=0.00708, over 16280.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08926, pruned_loss=0.01193, audio_tagging_loss=0.008731, over 3031905.42 frames. ], batch size: 62, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 04:29:00,309 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3810893.3333333335, ans=0.125 2023-11-29 04:29:04,534 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.95 vs. limit=15.0 2023-11-29 04:29:09,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3810893.3333333335, ans=0.125 2023-11-29 04:29:18,332 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 571650 2023-11-29 04:29:19,865 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.62 vs. limit=15.0 2023-11-29 04:29:20,560 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.681e+01 9.207e+01 9.940e+01 1.055e+02 1.312e+02, threshold=1.988e+02, percent-clipped=0.0 2023-11-29 04:29:32,258 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 04:29:45,193 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.17 vs. limit=15.0 2023-11-29 04:29:50,425 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 6550, loss[loss=0.05175, simple_loss=0.06603, pruned_loss=0.009771, audio_tagging_loss=0.008966, over 14706.00 frames. ], tot_loss[loss=0.06507, simple_loss=0.08915, pruned_loss=0.01191, audio_tagging_loss=0.008593, over 3033275.75 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 04:30:04,902 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.91 vs. limit=12.0 2023-11-29 04:30:10,743 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3811226.6666666665, ans=0.125 2023-11-29 04:30:20,586 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 571700 2023-11-29 04:30:37,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3811360.0, ans=0.125 2023-11-29 04:30:52,145 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 6600, loss[loss=0.06107, simple_loss=0.08458, pruned_loss=0.008897, audio_tagging_loss=0.009885, over 14415.00 frames. ], tot_loss[loss=0.0648, simple_loss=0.08893, pruned_loss=0.01183, audio_tagging_loss=0.008504, over 3033627.85 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 04:31:02,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3811493.3333333335, ans=0.2 2023-11-29 04:31:19,098 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.28 vs. limit=12.0 2023-11-29 04:31:19,239 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.08 vs. limit=15.0 2023-11-29 04:31:22,123 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 571750 2023-11-29 04:31:22,618 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.74 vs. limit=12.0 2023-11-29 04:31:24,396 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.875e+01 9.047e+01 9.716e+01 1.044e+02 1.337e+02, threshold=1.943e+02, percent-clipped=0.0 2023-11-29 04:31:26,606 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3811626.6666666665, ans=0.2 2023-11-29 04:31:38,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3811693.3333333335, ans=0.0 2023-11-29 04:31:44,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=3811760.0, ans=0.05 2023-11-29 04:31:47,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3811760.0, ans=0.125 2023-11-29 04:31:54,312 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 6650, loss[loss=0.05875, simple_loss=0.07935, pruned_loss=0.009824, audio_tagging_loss=0.009251, over 15562.00 frames. ], tot_loss[loss=0.0648, simple_loss=0.08891, pruned_loss=0.01191, audio_tagging_loss=0.008436, over 3037514.13 frames. ], batch size: 60, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 04:32:01,606 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3811826.6666666665, ans=0.125 2023-11-29 04:32:18,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3811960.0, ans=0.2 2023-11-29 04:32:23,974 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 571800 2023-11-29 04:32:27,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3811960.0, ans=0.07 2023-11-29 04:32:29,571 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.53 vs. limit=10.0 2023-11-29 04:32:31,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3812026.6666666665, ans=0.1 2023-11-29 04:32:39,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3812026.6666666665, ans=0.0 2023-11-29 04:32:45,641 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.31 vs. limit=15.0 2023-11-29 04:32:56,022 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 6700, loss[loss=0.07213, simple_loss=0.09324, pruned_loss=0.01591, audio_tagging_loss=0.009602, over 14537.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.08946, pruned_loss=0.01201, audio_tagging_loss=0.008388, over 3043117.13 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 04:33:09,831 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3812226.6666666665, ans=0.125 2023-11-29 04:33:14,393 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3812226.6666666665, ans=0.125 2023-11-29 04:33:25,708 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 571850 2023-11-29 04:33:29,131 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.986e+01 9.065e+01 9.575e+01 1.004e+02 1.192e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-29 04:33:37,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3812360.0, ans=0.2 2023-11-29 04:33:50,543 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3812426.6666666665, ans=0.125 2023-11-29 04:33:57,359 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 6750, loss[loss=0.05747, simple_loss=0.08833, pruned_loss=0.00723, audio_tagging_loss=0.006074, over 14790.00 frames. ], tot_loss[loss=0.0649, simple_loss=0.08894, pruned_loss=0.012, audio_tagging_loss=0.008434, over 3036893.22 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:34:01,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3812493.3333333335, ans=0.2 2023-11-29 04:34:26,742 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 571900 2023-11-29 04:34:59,711 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 6800, loss[loss=0.07274, simple_loss=0.1002, pruned_loss=0.01538, audio_tagging_loss=0.007243, over 13944.00 frames. ], tot_loss[loss=0.06496, simple_loss=0.08927, pruned_loss=0.01195, audio_tagging_loss=0.008375, over 3031432.23 frames. ], batch size: 53, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 04:35:19,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3812893.3333333335, ans=0.07 2023-11-29 04:35:26,397 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.34 vs. limit=22.5 2023-11-29 04:35:29,163 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 571950 2023-11-29 04:35:32,506 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.978e+01 8.971e+01 9.540e+01 1.002e+02 2.888e+02, threshold=1.908e+02, percent-clipped=1.0 2023-11-29 04:35:40,404 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 04:36:00,797 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 6850, loss[loss=0.05537, simple_loss=0.07803, pruned_loss=0.00672, audio_tagging_loss=0.009635, over 14401.00 frames. ], tot_loss[loss=0.06493, simple_loss=0.08918, pruned_loss=0.01195, audio_tagging_loss=0.008385, over 3035897.10 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 04:36:30,967 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 572000 2023-11-29 04:36:36,802 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.79 vs. limit=15.0 2023-11-29 04:36:39,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3813360.0, ans=0.125 2023-11-29 04:36:45,710 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3813360.0, ans=0.125 2023-11-29 04:36:53,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3813426.6666666665, ans=0.0 2023-11-29 04:37:05,100 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 6900, loss[loss=0.0672, simple_loss=0.1001, pruned_loss=0.01171, audio_tagging_loss=0.005414, over 14260.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.08966, pruned_loss=0.01196, audio_tagging_loss=0.008339, over 3038050.02 frames. ], batch size: 53, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 04:37:21,645 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.45 vs. limit=15.0 2023-11-29 04:37:31,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3813626.6666666665, ans=0.0 2023-11-29 04:37:34,614 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 572050 2023-11-29 04:37:34,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3813626.6666666665, ans=0.125 2023-11-29 04:37:38,055 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.957e+01 9.005e+01 9.691e+01 1.035e+02 1.354e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-29 04:37:47,290 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.16 vs. limit=15.0 2023-11-29 04:37:55,734 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 04:38:06,655 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 6950, loss[loss=0.06937, simple_loss=0.09948, pruned_loss=0.01244, audio_tagging_loss=0.00719, over 16043.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.08985, pruned_loss=0.01186, audio_tagging_loss=0.008404, over 3037668.28 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 04:38:06,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3813826.6666666665, ans=0.125 2023-11-29 04:38:25,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff3.min_abs, batch_count=3813893.3333333335, ans=0.2 2023-11-29 04:38:36,800 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 572100 2023-11-29 04:38:43,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3814026.6666666665, ans=0.125 2023-11-29 04:38:56,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3814093.3333333335, ans=0.125 2023-11-29 04:39:01,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3814093.3333333335, ans=0.125 2023-11-29 04:39:07,970 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 7000, loss[loss=0.05193, simple_loss=0.07579, pruned_loss=0.007632, audio_tagging_loss=0.006401, over 15307.00 frames. ], tot_loss[loss=0.06481, simple_loss=0.08931, pruned_loss=0.01172, audio_tagging_loss=0.008438, over 3033774.17 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:39:11,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3814160.0, ans=0.0 2023-11-29 04:39:16,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3814160.0, ans=0.07 2023-11-29 04:39:33,297 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3814293.3333333335, ans=0.1 2023-11-29 04:39:38,338 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 572150 2023-11-29 04:39:43,425 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.754e+01 8.947e+01 9.387e+01 1.017e+02 2.856e+02, threshold=1.877e+02, percent-clipped=1.0 2023-11-29 04:39:46,237 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3814360.0, ans=0.0 2023-11-29 04:40:10,551 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 7050, loss[loss=0.07578, simple_loss=0.1061, pruned_loss=0.01535, audio_tagging_loss=0.007393, over 15997.00 frames. ], tot_loss[loss=0.06509, simple_loss=0.0896, pruned_loss=0.01183, audio_tagging_loss=0.00846, over 3034563.71 frames. ], batch size: 60, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:40:11,252 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.31 vs. limit=12.0 2023-11-29 04:40:39,624 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 572200 2023-11-29 04:41:00,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3814760.0, ans=0.125 2023-11-29 04:41:12,080 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 7100, loss[loss=0.05314, simple_loss=0.07032, pruned_loss=0.009238, audio_tagging_loss=0.008737, over 14159.00 frames. ], tot_loss[loss=0.06479, simple_loss=0.089, pruned_loss=0.01179, audio_tagging_loss=0.008503, over 3031412.26 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 04:41:15,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3814826.6666666665, ans=0.125 2023-11-29 04:41:36,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3814960.0, ans=0.0 2023-11-29 04:41:38,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3814960.0, ans=0.1 2023-11-29 04:41:40,533 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 572250 2023-11-29 04:41:42,411 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3814960.0, ans=0.0 2023-11-29 04:41:47,409 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.723e+01 8.957e+01 9.566e+01 1.017e+02 1.804e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-29 04:41:58,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3815026.6666666665, ans=0.0 2023-11-29 04:42:09,203 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.03 vs. limit=15.0 2023-11-29 04:42:13,127 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 7150, loss[loss=0.07, simple_loss=0.09375, pruned_loss=0.01535, audio_tagging_loss=0.007779, over 13894.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08976, pruned_loss=0.01195, audio_tagging_loss=0.00852, over 3036559.27 frames. ], batch size: 52, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 04:42:27,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3815226.6666666665, ans=0.0 2023-11-29 04:42:41,819 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.14 vs. limit=12.0 2023-11-29 04:42:42,959 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 572300 2023-11-29 04:42:44,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3815293.3333333335, ans=0.2 2023-11-29 04:43:10,754 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3815426.6666666665, ans=0.1 2023-11-29 04:43:13,886 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 7200, loss[loss=0.0561, simple_loss=0.06724, pruned_loss=0.01222, audio_tagging_loss=0.01026, over 15320.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.09048, pruned_loss=0.01225, audio_tagging_loss=0.00855, over 3036434.44 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:43:34,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3815560.0, ans=0.125 2023-11-29 04:43:44,256 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 572350 2023-11-29 04:43:50,079 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.918e+01 9.002e+01 9.674e+01 1.041e+02 1.826e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-29 04:43:54,875 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3815693.3333333335, ans=0.95 2023-11-29 04:44:08,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3815760.0, ans=0.1 2023-11-29 04:44:15,633 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 7250, loss[loss=0.06624, simple_loss=0.08277, pruned_loss=0.01418, audio_tagging_loss=0.01068, over 15727.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.08977, pruned_loss=0.01212, audio_tagging_loss=0.008676, over 3043216.01 frames. ], batch size: 60, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:44:22,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3815826.6666666665, ans=0.125 2023-11-29 04:44:24,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3815826.6666666665, ans=0.1 2023-11-29 04:44:36,562 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.29 vs. limit=15.0 2023-11-29 04:44:37,290 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3815893.3333333335, ans=0.0 2023-11-29 04:44:38,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3815960.0, ans=0.125 2023-11-29 04:44:40,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3815960.0, ans=0.0 2023-11-29 04:44:44,368 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 572400 2023-11-29 04:45:05,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3816093.3333333335, ans=0.125 2023-11-29 04:45:18,393 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 7300, loss[loss=0.08302, simple_loss=0.1191, pruned_loss=0.01749, audio_tagging_loss=0.00596, over 15740.00 frames. ], tot_loss[loss=0.06528, simple_loss=0.08928, pruned_loss=0.012, audio_tagging_loss=0.00864, over 3038361.27 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:45:21,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3816160.0, ans=0.1 2023-11-29 04:45:35,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3816226.6666666665, ans=0.125 2023-11-29 04:45:48,201 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 572450 2023-11-29 04:45:48,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3816293.3333333335, ans=0.0 2023-11-29 04:45:53,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3816293.3333333335, ans=0.1 2023-11-29 04:45:54,551 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.609e+01 8.998e+01 9.655e+01 1.011e+02 1.283e+02, threshold=1.931e+02, percent-clipped=0.0 2023-11-29 04:46:04,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3816360.0, ans=0.125 2023-11-29 04:46:15,215 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3816426.6666666665, ans=0.07 2023-11-29 04:46:19,744 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 7350, loss[loss=0.07449, simple_loss=0.1071, pruned_loss=0.01344, audio_tagging_loss=0.00748, over 16856.00 frames. ], tot_loss[loss=0.0651, simple_loss=0.08906, pruned_loss=0.01204, audio_tagging_loss=0.008524, over 3047671.11 frames. ], batch size: 61, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:46:26,910 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.00 vs. limit=15.0 2023-11-29 04:46:32,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3816560.0, ans=0.1 2023-11-29 04:46:49,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3816626.6666666665, ans=0.0 2023-11-29 04:46:50,117 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 572500 2023-11-29 04:47:03,862 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.85 vs. limit=8.0 2023-11-29 04:47:21,244 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 7400, loss[loss=0.06059, simple_loss=0.08299, pruned_loss=0.01255, audio_tagging_loss=0.006541, over 13688.00 frames. ], tot_loss[loss=0.06434, simple_loss=0.08803, pruned_loss=0.01187, audio_tagging_loss=0.00845, over 3043369.13 frames. ], batch size: 52, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:47:41,101 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3816893.3333333335, ans=0.1 2023-11-29 04:47:46,093 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.41 vs. limit=15.0 2023-11-29 04:47:51,299 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 572550 2023-11-29 04:47:56,934 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.719e+01 8.857e+01 9.571e+01 1.032e+02 1.214e+02, threshold=1.914e+02, percent-clipped=0.0 2023-11-29 04:48:00,754 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.84 vs. limit=6.0 2023-11-29 04:48:08,797 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.07 vs. limit=15.0 2023-11-29 04:48:23,989 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 7450, loss[loss=0.05128, simple_loss=0.06482, pruned_loss=0.00861, audio_tagging_loss=0.01026, over 15443.00 frames. ], tot_loss[loss=0.06471, simple_loss=0.08885, pruned_loss=0.01195, audio_tagging_loss=0.008337, over 3048301.21 frames. ], batch size: 61, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:48:40,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3817226.6666666665, ans=0.0 2023-11-29 04:48:52,810 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 572600 2023-11-29 04:49:03,969 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3817360.0, ans=0.1 2023-11-29 04:49:06,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=3817360.0, ans=0.05 2023-11-29 04:49:15,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3817426.6666666665, ans=0.125 2023-11-29 04:49:16,865 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.01 vs. limit=15.0 2023-11-29 04:49:19,895 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3817426.6666666665, ans=0.125 2023-11-29 04:49:25,579 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 7500, loss[loss=0.07728, simple_loss=0.1051, pruned_loss=0.01655, audio_tagging_loss=0.008157, over 14369.00 frames. ], tot_loss[loss=0.065, simple_loss=0.08921, pruned_loss=0.01208, audio_tagging_loss=0.008312, over 3047565.07 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:49:43,261 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3817560.0, ans=0.1 2023-11-29 04:49:44,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3817560.0, ans=0.0 2023-11-29 04:49:48,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3817560.0, ans=0.1 2023-11-29 04:49:56,472 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 572650 2023-11-29 04:50:02,215 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.892e+01 9.110e+01 9.749e+01 1.048e+02 1.256e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-29 04:50:07,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3817693.3333333335, ans=0.0 2023-11-29 04:50:27,275 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 7550, loss[loss=0.05239, simple_loss=0.07776, pruned_loss=0.005317, audio_tagging_loss=0.008194, over 15219.00 frames. ], tot_loss[loss=0.06511, simple_loss=0.08935, pruned_loss=0.01217, audio_tagging_loss=0.008272, over 3044533.77 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:50:30,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3817826.6666666665, ans=0.0 2023-11-29 04:50:36,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3817826.6666666665, ans=0.0 2023-11-29 04:50:43,902 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.59 vs. limit=15.0 2023-11-29 04:50:51,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3817960.0, ans=0.125 2023-11-29 04:50:57,324 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 572700 2023-11-29 04:51:29,835 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 7600, loss[loss=0.06014, simple_loss=0.07745, pruned_loss=0.01053, audio_tagging_loss=0.01088, over 15292.00 frames. ], tot_loss[loss=0.06538, simple_loss=0.08977, pruned_loss=0.01218, audio_tagging_loss=0.008312, over 3046605.00 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 04:51:33,987 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.51 vs. limit=15.0 2023-11-29 04:51:35,279 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.33 vs. limit=6.0 2023-11-29 04:51:49,529 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.78 vs. limit=15.0 2023-11-29 04:51:58,857 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 572750 2023-11-29 04:52:04,700 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.575e+01 8.852e+01 9.526e+01 1.029e+02 1.380e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-29 04:52:17,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3818360.0, ans=0.125 2023-11-29 04:52:19,636 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3818426.6666666665, ans=0.1 2023-11-29 04:52:26,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3818426.6666666665, ans=0.0 2023-11-29 04:52:30,911 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 7650, loss[loss=0.07334, simple_loss=0.1044, pruned_loss=0.0132, audio_tagging_loss=0.007942, over 16385.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08932, pruned_loss=0.01214, audio_tagging_loss=0.008385, over 3045700.90 frames. ], batch size: 60, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 04:52:31,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3818493.3333333335, ans=0.125 2023-11-29 04:52:37,372 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.76 vs. limit=15.0 2023-11-29 04:52:39,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3818493.3333333335, ans=0.0 2023-11-29 04:52:42,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3818560.0, ans=0.125 2023-11-29 04:52:47,451 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.84 vs. limit=15.0 2023-11-29 04:53:00,521 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 572800 2023-11-29 04:53:30,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3818760.0, ans=0.025 2023-11-29 04:53:32,458 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 7700, loss[loss=0.06768, simple_loss=0.08607, pruned_loss=0.01465, audio_tagging_loss=0.009996, over 14002.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08937, pruned_loss=0.01204, audio_tagging_loss=0.008451, over 3044541.70 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 04:53:44,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3818893.3333333335, ans=0.125 2023-11-29 04:54:02,661 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 572850 2023-11-29 04:54:09,488 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.101e+01 9.082e+01 9.588e+01 1.045e+02 1.280e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-29 04:54:15,418 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3819026.6666666665, ans=0.1 2023-11-29 04:54:17,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3819026.6666666665, ans=0.125 2023-11-29 04:54:34,874 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 7750, loss[loss=0.0693, simple_loss=0.09901, pruned_loss=0.01397, audio_tagging_loss=0.00583, over 16628.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08975, pruned_loss=0.01203, audio_tagging_loss=0.008466, over 3047623.98 frames. ], batch size: 61, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:54:52,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3819226.6666666665, ans=0.125 2023-11-29 04:54:54,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3819226.6666666665, ans=0.0 2023-11-29 04:55:04,146 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 572900 2023-11-29 04:55:27,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3819426.6666666665, ans=0.125 2023-11-29 04:55:31,735 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3819426.6666666665, ans=0.04949747468305833 2023-11-29 04:55:36,102 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 7800, loss[loss=0.06807, simple_loss=0.09392, pruned_loss=0.01292, audio_tagging_loss=0.008195, over 15992.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08996, pruned_loss=0.01203, audio_tagging_loss=0.008491, over 3042490.17 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:55:37,920 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.51 vs. limit=6.0 2023-11-29 04:56:05,509 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 572950 2023-11-29 04:56:10,044 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.70 vs. limit=15.0 2023-11-29 04:56:12,431 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.34 vs. limit=12.0 2023-11-29 04:56:12,920 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.735e+01 9.184e+01 1.003e+02 1.060e+02 1.343e+02, threshold=2.007e+02, percent-clipped=0.0 2023-11-29 04:56:15,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3819693.3333333335, ans=0.125 2023-11-29 04:56:37,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3819826.6666666665, ans=0.125 2023-11-29 04:56:37,831 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 7850, loss[loss=0.07276, simple_loss=0.09603, pruned_loss=0.01527, audio_tagging_loss=0.009469, over 15854.00 frames. ], tot_loss[loss=0.0658, simple_loss=0.09033, pruned_loss=0.01215, audio_tagging_loss=0.008483, over 3046280.33 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:56:44,506 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.72 vs. limit=12.0 2023-11-29 04:56:46,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3819826.6666666665, ans=0.0 2023-11-29 04:57:05,468 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.41 vs. limit=6.0 2023-11-29 04:57:07,166 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 573000 2023-11-29 04:57:09,073 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3819960.0, ans=0.05 2023-11-29 04:57:23,068 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.50 vs. limit=12.0 2023-11-29 04:57:39,587 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 7900, loss[loss=0.07185, simple_loss=0.1023, pruned_loss=0.01406, audio_tagging_loss=0.006661, over 15313.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.09051, pruned_loss=0.0121, audio_tagging_loss=0.008571, over 3047138.03 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:57:39,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3820160.0, ans=0.0 2023-11-29 04:57:40,204 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.23 vs. limit=15.0 2023-11-29 04:57:46,587 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.80 vs. limit=15.0 2023-11-29 04:57:51,283 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.39 vs. limit=15.0 2023-11-29 04:58:09,656 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 573050 2023-11-29 04:58:10,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3820293.3333333335, ans=0.0 2023-11-29 04:58:16,472 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.705e+01 9.085e+01 9.812e+01 1.049e+02 1.531e+02, threshold=1.962e+02, percent-clipped=0.0 2023-11-29 04:58:24,144 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3820360.0, ans=0.0 2023-11-29 04:58:25,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3820360.0, ans=0.125 2023-11-29 04:58:28,800 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3820426.6666666665, ans=0.0 2023-11-29 04:58:35,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3820426.6666666665, ans=0.125 2023-11-29 04:58:37,860 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3820426.6666666665, ans=0.125 2023-11-29 04:58:41,053 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 7950, loss[loss=0.06315, simple_loss=0.08084, pruned_loss=0.01345, audio_tagging_loss=0.009284, over 15810.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.09087, pruned_loss=0.01215, audio_tagging_loss=0.00867, over 3054094.94 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:58:42,977 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.25 vs. limit=15.0 2023-11-29 04:58:44,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3820493.3333333335, ans=0.125 2023-11-29 04:58:52,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3820560.0, ans=0.125 2023-11-29 04:58:52,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3820560.0, ans=0.2 2023-11-29 04:58:54,001 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 04:59:00,135 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 04:59:04,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3820626.6666666665, ans=0.125 2023-11-29 04:59:05,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3820626.6666666665, ans=0.0 2023-11-29 04:59:07,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3820626.6666666665, ans=0.1 2023-11-29 04:59:11,295 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 573100 2023-11-29 04:59:16,429 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.39 vs. limit=22.5 2023-11-29 04:59:20,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3820693.3333333335, ans=0.125 2023-11-29 04:59:21,569 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3820693.3333333335, ans=0.1 2023-11-29 04:59:26,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3820693.3333333335, ans=0.0 2023-11-29 04:59:43,500 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 8000, loss[loss=0.05545, simple_loss=0.07569, pruned_loss=0.008062, audio_tagging_loss=0.009539, over 15803.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08926, pruned_loss=0.01196, audio_tagging_loss=0.008777, over 3048854.32 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 04:59:52,690 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.00 vs. limit=15.0 2023-11-29 05:00:08,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3820960.0, ans=0.0 2023-11-29 05:00:12,759 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 573150 2023-11-29 05:00:20,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3821026.6666666665, ans=0.125 2023-11-29 05:00:20,800 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.456e+01 9.160e+01 9.620e+01 1.029e+02 4.171e+02, threshold=1.924e+02, percent-clipped=1.0 2023-11-29 05:00:24,538 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.97 vs. limit=15.0 2023-11-29 05:00:26,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3821026.6666666665, ans=0.1 2023-11-29 05:00:34,542 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.65 vs. limit=6.0 2023-11-29 05:00:45,130 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 8050, loss[loss=0.06556, simple_loss=0.08654, pruned_loss=0.01259, audio_tagging_loss=0.009709, over 14698.00 frames. ], tot_loss[loss=0.06515, simple_loss=0.08879, pruned_loss=0.01195, audio_tagging_loss=0.008814, over 3048264.48 frames. ], batch size: 53, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:01:14,625 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 573200 2023-11-29 05:01:19,351 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3821293.3333333335, ans=0.125 2023-11-29 05:01:26,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3821360.0, ans=0.125 2023-11-29 05:01:31,645 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=3821360.0, ans=0.5 2023-11-29 05:01:33,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3821426.6666666665, ans=0.125 2023-11-29 05:01:47,021 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 8100, loss[loss=0.04082, simple_loss=0.05333, pruned_loss=0.006396, audio_tagging_loss=0.007761, over 14048.00 frames. ], tot_loss[loss=0.06454, simple_loss=0.08774, pruned_loss=0.01181, audio_tagging_loss=0.008856, over 3048573.50 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:01:47,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3821493.3333333335, ans=0.1 2023-11-29 05:02:01,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3821560.0, ans=0.04949747468305833 2023-11-29 05:02:16,358 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 573250 2023-11-29 05:02:18,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3821626.6666666665, ans=0.1 2023-11-29 05:02:19,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3821626.6666666665, ans=0.125 2023-11-29 05:02:20,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3821626.6666666665, ans=0.0 2023-11-29 05:02:25,672 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.207e+01 9.031e+01 9.567e+01 1.056e+02 1.290e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-29 05:02:35,656 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.44 vs. limit=15.0 2023-11-29 05:02:48,031 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 8150, loss[loss=0.06293, simple_loss=0.08099, pruned_loss=0.0113, audio_tagging_loss=0.01114, over 14442.00 frames. ], tot_loss[loss=0.06486, simple_loss=0.08851, pruned_loss=0.01188, audio_tagging_loss=0.008723, over 3051266.58 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:03:01,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3821893.3333333335, ans=0.0 2023-11-29 05:03:18,649 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 573300 2023-11-29 05:03:30,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3822026.6666666665, ans=0.125 2023-11-29 05:03:35,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3822026.6666666665, ans=0.07 2023-11-29 05:03:38,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3822093.3333333335, ans=0.2 2023-11-29 05:03:40,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3822093.3333333335, ans=0.125 2023-11-29 05:03:41,328 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.16 vs. limit=10.0 2023-11-29 05:03:42,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3822093.3333333335, ans=0.125 2023-11-29 05:03:50,154 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 8200, loss[loss=0.05849, simple_loss=0.08158, pruned_loss=0.008158, audio_tagging_loss=0.00954, over 15323.00 frames. ], tot_loss[loss=0.0644, simple_loss=0.08792, pruned_loss=0.01174, audio_tagging_loss=0.0087, over 3051502.84 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:03:54,239 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 05:04:11,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3822226.6666666665, ans=0.125 2023-11-29 05:04:19,289 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 573350 2023-11-29 05:04:26,221 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.26 vs. limit=6.0 2023-11-29 05:04:27,922 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.713e+01 9.111e+01 9.648e+01 1.058e+02 1.357e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-29 05:04:32,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3822360.0, ans=0.0 2023-11-29 05:04:35,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3822360.0, ans=0.125 2023-11-29 05:04:51,500 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 8250, loss[loss=0.05498, simple_loss=0.0747, pruned_loss=0.006878, audio_tagging_loss=0.01075, over 15817.00 frames. ], tot_loss[loss=0.06496, simple_loss=0.08925, pruned_loss=0.01179, audio_tagging_loss=0.008552, over 3053789.92 frames. ], batch size: 61, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:04:58,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=3822493.3333333335, ans=6.0 2023-11-29 05:05:07,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3822560.0, ans=0.125 2023-11-29 05:05:10,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3822560.0, ans=0.09899494936611666 2023-11-29 05:05:17,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3822626.6666666665, ans=0.125 2023-11-29 05:05:20,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3822626.6666666665, ans=0.5 2023-11-29 05:05:21,029 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 573400 2023-11-29 05:05:26,732 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3822626.6666666665, ans=0.1 2023-11-29 05:05:30,479 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.43 vs. limit=15.0 2023-11-29 05:05:38,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3822693.3333333335, ans=0.125 2023-11-29 05:05:52,745 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 8300, loss[loss=0.05848, simple_loss=0.08119, pruned_loss=0.008232, audio_tagging_loss=0.009659, over 15142.00 frames. ], tot_loss[loss=0.06488, simple_loss=0.08893, pruned_loss=0.01183, audio_tagging_loss=0.008584, over 3048311.24 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:05:56,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3822826.6666666665, ans=0.125 2023-11-29 05:06:12,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3822893.3333333335, ans=0.1 2023-11-29 05:06:23,366 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 573450 2023-11-29 05:06:25,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3822960.0, ans=0.1 2023-11-29 05:06:26,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3822960.0, ans=0.2 2023-11-29 05:06:31,404 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.078e+01 8.946e+01 9.758e+01 1.060e+02 1.383e+02, threshold=1.952e+02, percent-clipped=0.0 2023-11-29 05:06:31,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3823026.6666666665, ans=0.0 2023-11-29 05:06:55,004 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 8350, loss[loss=0.06889, simple_loss=0.08763, pruned_loss=0.01663, audio_tagging_loss=0.00845, over 15643.00 frames. ], tot_loss[loss=0.06451, simple_loss=0.08832, pruned_loss=0.0118, audio_tagging_loss=0.00855, over 3045997.33 frames. ], batch size: 61, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:06:56,520 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3823160.0, ans=0.125 2023-11-29 05:06:59,263 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3823160.0, ans=0.125 2023-11-29 05:07:24,422 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 573500 2023-11-29 05:07:29,259 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3823293.3333333335, ans=0.125 2023-11-29 05:07:39,737 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.11 vs. limit=15.0 2023-11-29 05:07:50,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3823426.6666666665, ans=0.125 2023-11-29 05:07:51,173 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3823426.6666666665, ans=0.125 2023-11-29 05:07:57,401 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 8400, loss[loss=0.08646, simple_loss=0.1213, pruned_loss=0.01903, audio_tagging_loss=0.006771, over 16411.00 frames. ], tot_loss[loss=0.06482, simple_loss=0.08873, pruned_loss=0.01189, audio_tagging_loss=0.008565, over 3042757.60 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 05:08:07,108 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.66 vs. limit=15.0 2023-11-29 05:08:16,571 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.89 vs. limit=6.0 2023-11-29 05:08:18,394 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3823560.0, ans=0.0 2023-11-29 05:08:25,954 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 573550 2023-11-29 05:08:36,232 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.903e+01 9.025e+01 9.772e+01 1.057e+02 1.487e+02, threshold=1.954e+02, percent-clipped=0.0 2023-11-29 05:08:40,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3823693.3333333335, ans=0.125 2023-11-29 05:08:56,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3823826.6666666665, ans=0.125 2023-11-29 05:08:56,909 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 8450, loss[loss=0.08357, simple_loss=0.1135, pruned_loss=0.02012, audio_tagging_loss=0.006682, over 14964.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.08929, pruned_loss=0.01204, audio_tagging_loss=0.00852, over 3047339.43 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:09:12,362 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3823893.3333333335, ans=0.125 2023-11-29 05:09:28,182 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 573600 2023-11-29 05:09:35,099 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.49 vs. limit=15.0 2023-11-29 05:09:36,746 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3824026.6666666665, ans=0.0 2023-11-29 05:09:59,992 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 8500, loss[loss=0.06367, simple_loss=0.08352, pruned_loss=0.01477, audio_tagging_loss=0.007136, over 14667.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.08955, pruned_loss=0.01203, audio_tagging_loss=0.008452, over 3041524.97 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:10:00,664 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.32 vs. limit=10.0 2023-11-29 05:10:22,930 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3824226.6666666665, ans=0.0 2023-11-29 05:10:29,815 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 573650 2023-11-29 05:10:32,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3824293.3333333335, ans=0.125 2023-11-29 05:10:38,172 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3824360.0, ans=0.125 2023-11-29 05:10:39,023 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.356e+01 9.190e+01 9.692e+01 1.077e+02 1.317e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-29 05:11:02,963 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 8550, loss[loss=0.0541, simple_loss=0.08038, pruned_loss=0.004965, audio_tagging_loss=0.00894, over 14833.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.09042, pruned_loss=0.01212, audio_tagging_loss=0.008484, over 3058895.19 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 05:11:04,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3824493.3333333335, ans=0.0 2023-11-29 05:11:09,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3824493.3333333335, ans=0.025 2023-11-29 05:11:12,604 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3824493.3333333335, ans=0.125 2023-11-29 05:11:20,978 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.17 vs. limit=10.0 2023-11-29 05:11:31,567 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 573700 2023-11-29 05:12:03,575 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 8600, loss[loss=0.0596, simple_loss=0.08033, pruned_loss=0.009607, audio_tagging_loss=0.009826, over 16115.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.09028, pruned_loss=0.01214, audio_tagging_loss=0.008545, over 3060107.03 frames. ], batch size: 60, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 05:12:26,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3824893.3333333335, ans=0.125 2023-11-29 05:12:30,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3824960.0, ans=0.125 2023-11-29 05:12:33,562 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 573750 2023-11-29 05:12:40,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3825026.6666666665, ans=0.2 2023-11-29 05:12:42,133 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 05:12:44,136 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.305e+01 8.900e+01 9.530e+01 1.037e+02 1.292e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-29 05:12:52,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3825093.3333333335, ans=0.0 2023-11-29 05:12:57,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3825093.3333333335, ans=0.125 2023-11-29 05:12:59,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3825093.3333333335, ans=0.2 2023-11-29 05:13:04,718 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 8650, loss[loss=0.06417, simple_loss=0.0807, pruned_loss=0.01401, audio_tagging_loss=0.009809, over 15646.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.09003, pruned_loss=0.01203, audio_tagging_loss=0.008564, over 3059543.96 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 05:13:08,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3825160.0, ans=0.125 2023-11-29 05:13:12,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3825160.0, ans=0.0 2023-11-29 05:13:17,504 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3825226.6666666665, ans=0.1 2023-11-29 05:13:34,652 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 573800 2023-11-29 05:13:43,547 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.02 vs. limit=22.5 2023-11-29 05:13:45,664 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3825360.0, ans=0.125 2023-11-29 05:13:46,950 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3825360.0, ans=0.1 2023-11-29 05:14:06,961 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 8700, loss[loss=0.05086, simple_loss=0.06726, pruned_loss=0.006664, audio_tagging_loss=0.01057, over 14362.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08938, pruned_loss=0.01195, audio_tagging_loss=0.008679, over 3057187.06 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 05:14:25,611 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.67 vs. limit=22.5 2023-11-29 05:14:36,375 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 573850 2023-11-29 05:14:37,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3825626.6666666665, ans=0.0 2023-11-29 05:14:41,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3825626.6666666665, ans=0.125 2023-11-29 05:14:47,682 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.898e+01 9.152e+01 9.894e+01 1.070e+02 1.338e+02, threshold=1.979e+02, percent-clipped=0.0 2023-11-29 05:15:08,744 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 8750, loss[loss=0.07139, simple_loss=0.1027, pruned_loss=0.01382, audio_tagging_loss=0.006229, over 14334.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08971, pruned_loss=0.01212, audio_tagging_loss=0.008718, over 3048029.85 frames. ], batch size: 53, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 05:15:13,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3825826.6666666665, ans=0.2 2023-11-29 05:15:26,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3825893.3333333335, ans=0.125 2023-11-29 05:15:36,211 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.21 vs. limit=15.0 2023-11-29 05:15:37,841 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 573900 2023-11-29 05:15:45,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3826026.6666666665, ans=0.1 2023-11-29 05:15:47,782 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.95 vs. limit=15.0 2023-11-29 05:15:53,324 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 05:15:54,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3826026.6666666665, ans=0.2 2023-11-29 05:15:58,254 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.17 vs. limit=15.0 2023-11-29 05:16:00,583 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3826093.3333333335, ans=0.125 2023-11-29 05:16:02,906 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3826093.3333333335, ans=0.0 2023-11-29 05:16:10,231 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 8800, loss[loss=0.0589, simple_loss=0.07419, pruned_loss=0.01024, audio_tagging_loss=0.01157, over 15241.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.0902, pruned_loss=0.01234, audio_tagging_loss=0.008767, over 3040832.19 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:16:36,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3826293.3333333335, ans=0.125 2023-11-29 05:16:39,834 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 573950 2023-11-29 05:16:41,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3826293.3333333335, ans=0.0 2023-11-29 05:16:44,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3826293.3333333335, ans=0.0 2023-11-29 05:16:48,692 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.72 vs. limit=15.0 2023-11-29 05:16:50,304 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.805e+01 9.120e+01 9.746e+01 1.050e+02 1.300e+02, threshold=1.949e+02, percent-clipped=0.0 2023-11-29 05:17:11,285 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 8850, loss[loss=0.06431, simple_loss=0.091, pruned_loss=0.01011, audio_tagging_loss=0.008697, over 15474.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.09032, pruned_loss=0.01222, audio_tagging_loss=0.008788, over 3036625.77 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:17:26,593 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 05:17:29,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3826560.0, ans=0.125 2023-11-29 05:17:31,798 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.32 vs. limit=12.0 2023-11-29 05:17:40,754 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 574000 2023-11-29 05:17:44,272 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3826626.6666666665, ans=0.125 2023-11-29 05:17:58,868 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3826693.3333333335, ans=0.2 2023-11-29 05:18:14,004 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 8900, loss[loss=0.05444, simple_loss=0.07679, pruned_loss=0.008126, audio_tagging_loss=0.007914, over 15674.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.09018, pruned_loss=0.01214, audio_tagging_loss=0.008583, over 3042768.10 frames. ], batch size: 60, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:18:17,863 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3826826.6666666665, ans=0.1 2023-11-29 05:18:22,909 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.01 vs. limit=15.0 2023-11-29 05:18:24,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3826893.3333333335, ans=0.125 2023-11-29 05:18:31,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3826893.3333333335, ans=0.125 2023-11-29 05:18:34,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3826893.3333333335, ans=0.2 2023-11-29 05:18:37,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3826960.0, ans=0.0 2023-11-29 05:18:38,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3826960.0, ans=0.125 2023-11-29 05:18:41,827 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.75 vs. limit=15.0 2023-11-29 05:18:43,748 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 574050 2023-11-29 05:18:54,693 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.607e+01 9.202e+01 9.774e+01 1.025e+02 3.343e+02, threshold=1.955e+02, percent-clipped=1.0 2023-11-29 05:19:15,222 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 8950, loss[loss=0.04763, simple_loss=0.06148, pruned_loss=0.008087, audio_tagging_loss=0.008801, over 16316.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.09009, pruned_loss=0.01222, audio_tagging_loss=0.008403, over 3043464.12 frames. ], batch size: 64, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:19:21,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3827160.0, ans=0.125 2023-11-29 05:19:26,726 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3827226.6666666665, ans=0.1 2023-11-29 05:19:29,842 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 05:19:35,894 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.59 vs. limit=15.0 2023-11-29 05:19:40,952 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff2.min_abs, batch_count=3827293.3333333335, ans=0.1 2023-11-29 05:19:45,632 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 574100 2023-11-29 05:19:49,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3827293.3333333335, ans=0.0 2023-11-29 05:19:50,142 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.64 vs. limit=22.5 2023-11-29 05:20:05,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3827426.6666666665, ans=0.2 2023-11-29 05:20:17,590 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 9000, loss[loss=0.06295, simple_loss=0.08244, pruned_loss=0.01204, audio_tagging_loss=0.009689, over 16696.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.09077, pruned_loss=0.01241, audio_tagging_loss=0.008423, over 3043881.26 frames. ], batch size: 63, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:20:17,591 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-29 05:20:56,938 INFO [train_asr.py:1267] (1/4) Epoch 48, validation: loss=0.05922, simple_loss=0.05036, pruned_loss=0.00529, audio_tagging_loss=0.02875, over 4681554.00 frames. 2023-11-29 05:20:56,938 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-29 05:21:03,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3827493.3333333335, ans=0.125 2023-11-29 05:21:23,018 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3827626.6666666665, ans=0.0 2023-11-29 05:21:25,660 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.26 vs. limit=22.5 2023-11-29 05:21:26,381 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 574150 2023-11-29 05:21:27,646 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3827626.6666666665, ans=0.125 2023-11-29 05:21:34,377 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3827693.3333333335, ans=0.125 2023-11-29 05:21:37,482 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.198e+01 9.287e+01 9.829e+01 1.058e+02 1.335e+02, threshold=1.966e+02, percent-clipped=0.0 2023-11-29 05:21:58,596 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 9050, loss[loss=0.0727, simple_loss=0.1024, pruned_loss=0.01469, audio_tagging_loss=0.006818, over 14873.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.09041, pruned_loss=0.01222, audio_tagging_loss=0.008303, over 3034516.76 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:22:08,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3827826.6666666665, ans=0.0 2023-11-29 05:22:10,415 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.66 vs. limit=15.0 2023-11-29 05:22:18,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3827893.3333333335, ans=0.0 2023-11-29 05:22:26,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3827960.0, ans=0.0 2023-11-29 05:22:27,902 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 574200 2023-11-29 05:22:28,186 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3827960.0, ans=0.125 2023-11-29 05:23:00,529 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 9100, loss[loss=0.06214, simple_loss=0.0845, pruned_loss=0.009546, audio_tagging_loss=0.01034, over 16253.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08989, pruned_loss=0.01201, audio_tagging_loss=0.00834, over 3043762.24 frames. ], batch size: 60, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:23:04,800 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3828160.0, ans=0.125 2023-11-29 05:23:29,730 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 574250 2023-11-29 05:23:35,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_na.min_abs, batch_count=3828293.3333333335, ans=0.02 2023-11-29 05:23:38,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3828360.0, ans=0.2 2023-11-29 05:23:40,964 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.452e+01 9.007e+01 9.515e+01 1.034e+02 1.309e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-29 05:23:47,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3828360.0, ans=0.0 2023-11-29 05:23:54,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3828426.6666666665, ans=0.1 2023-11-29 05:23:55,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3828426.6666666665, ans=0.1 2023-11-29 05:23:56,635 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.54 vs. limit=12.0 2023-11-29 05:24:02,102 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 9150, loss[loss=0.04915, simple_loss=0.06402, pruned_loss=0.007986, audio_tagging_loss=0.009153, over 14711.00 frames. ], tot_loss[loss=0.06503, simple_loss=0.08934, pruned_loss=0.01191, audio_tagging_loss=0.008445, over 3047405.34 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:24:04,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3828493.3333333335, ans=0.0 2023-11-29 05:24:06,035 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3828493.3333333335, ans=0.1 2023-11-29 05:24:32,094 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 574300 2023-11-29 05:24:44,500 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3828693.3333333335, ans=0.125 2023-11-29 05:25:04,027 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 9200, loss[loss=0.04422, simple_loss=0.05361, pruned_loss=0.008974, audio_tagging_loss=0.008441, over 13636.00 frames. ], tot_loss[loss=0.06515, simple_loss=0.08934, pruned_loss=0.01205, audio_tagging_loss=0.00843, over 3046021.14 frames. ], batch size: 53, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 05:25:16,802 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.94 vs. limit=15.0 2023-11-29 05:25:27,563 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.44 vs. limit=22.5 2023-11-29 05:25:32,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3828960.0, ans=0.0 2023-11-29 05:25:33,876 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 574350 2023-11-29 05:25:40,035 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3829026.6666666665, ans=0.0 2023-11-29 05:25:44,293 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.055e+01 8.927e+01 9.501e+01 1.029e+02 1.392e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-29 05:25:53,223 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.04 vs. limit=10.0 2023-11-29 05:26:02,263 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3829093.3333333335, ans=0.1 2023-11-29 05:26:06,042 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 9250, loss[loss=0.06647, simple_loss=0.09348, pruned_loss=0.01312, audio_tagging_loss=0.006607, over 15259.00 frames. ], tot_loss[loss=0.06486, simple_loss=0.08873, pruned_loss=0.01193, audio_tagging_loss=0.008561, over 3050771.17 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 05:26:14,569 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.82 vs. limit=6.0 2023-11-29 05:26:17,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3829226.6666666665, ans=0.0 2023-11-29 05:26:28,612 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.69 vs. limit=15.0 2023-11-29 05:26:29,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3829293.3333333335, ans=0.125 2023-11-29 05:26:35,711 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 574400 2023-11-29 05:26:35,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3829293.3333333335, ans=0.125 2023-11-29 05:26:39,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3829293.3333333335, ans=0.0 2023-11-29 05:26:45,486 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.97 vs. limit=22.5 2023-11-29 05:26:54,113 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.73 vs. limit=15.0 2023-11-29 05:26:55,173 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3829426.6666666665, ans=0.0 2023-11-29 05:26:57,500 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3829426.6666666665, ans=0.1 2023-11-29 05:27:08,233 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 9300, loss[loss=0.06518, simple_loss=0.08552, pruned_loss=0.01472, audio_tagging_loss=0.007697, over 14813.00 frames. ], tot_loss[loss=0.06495, simple_loss=0.08893, pruned_loss=0.01191, audio_tagging_loss=0.008576, over 3049169.72 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 05:27:18,766 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3829560.0, ans=0.0 2023-11-29 05:27:28,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3829560.0, ans=0.1 2023-11-29 05:27:37,426 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 574450 2023-11-29 05:27:37,705 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3829626.6666666665, ans=0.125 2023-11-29 05:27:51,500 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.398e+01 8.967e+01 9.597e+01 1.017e+02 1.229e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-29 05:27:51,774 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 05:27:57,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3829760.0, ans=0.125 2023-11-29 05:28:00,185 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3829760.0, ans=0.125 2023-11-29 05:28:09,019 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 9350, loss[loss=0.0511, simple_loss=0.07345, pruned_loss=0.006904, audio_tagging_loss=0.007475, over 15068.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08898, pruned_loss=0.01208, audio_tagging_loss=0.008578, over 3056888.40 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 05:28:14,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3829826.6666666665, ans=0.125 2023-11-29 05:28:16,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3829826.6666666665, ans=0.125 2023-11-29 05:28:18,602 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3829826.6666666665, ans=0.09899494936611666 2023-11-29 05:28:24,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3829893.3333333335, ans=0.125 2023-11-29 05:28:27,218 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3829893.3333333335, ans=0.0 2023-11-29 05:28:39,342 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 574500 2023-11-29 05:28:44,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3829960.0, ans=0.0 2023-11-29 05:28:59,948 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3830093.3333333335, ans=0.0 2023-11-29 05:29:10,055 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 9400, loss[loss=0.06345, simple_loss=0.08906, pruned_loss=0.01159, audio_tagging_loss=0.007334, over 15202.00 frames. ], tot_loss[loss=0.06489, simple_loss=0.08862, pruned_loss=0.012, audio_tagging_loss=0.008579, over 3049521.10 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 05:29:11,520 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3830160.0, ans=0.1 2023-11-29 05:29:19,787 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3830160.0, ans=0.04949747468305833 2023-11-29 05:29:33,742 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3830293.3333333335, ans=0.0 2023-11-29 05:29:34,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3830293.3333333335, ans=0.125 2023-11-29 05:29:39,453 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 574550 2023-11-29 05:29:53,482 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.610e+01 9.033e+01 9.709e+01 1.034e+02 1.178e+02, threshold=1.942e+02, percent-clipped=0.0 2023-11-29 05:29:53,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3830360.0, ans=0.0 2023-11-29 05:30:07,959 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.41 vs. limit=15.0 2023-11-29 05:30:12,147 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 9450, loss[loss=0.0693, simple_loss=0.09529, pruned_loss=0.0134, audio_tagging_loss=0.00825, over 15016.00 frames. ], tot_loss[loss=0.06476, simple_loss=0.08828, pruned_loss=0.01192, audio_tagging_loss=0.008696, over 3048350.06 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 05:30:13,328 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 05:30:37,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3830626.6666666665, ans=0.09899494936611666 2023-11-29 05:30:38,319 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3830626.6666666665, ans=0.0 2023-11-29 05:30:41,528 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 574600 2023-11-29 05:31:13,402 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 9500, loss[loss=0.06657, simple_loss=0.09451, pruned_loss=0.01178, audio_tagging_loss=0.007536, over 16047.00 frames. ], tot_loss[loss=0.06493, simple_loss=0.08862, pruned_loss=0.01197, audio_tagging_loss=0.008652, over 3050478.97 frames. ], batch size: 60, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 05:31:23,381 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3830826.6666666665, ans=0.125 2023-11-29 05:31:39,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3830960.0, ans=0.125 2023-11-29 05:31:44,311 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 574650 2023-11-29 05:31:45,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3830960.0, ans=0.125 2023-11-29 05:31:49,088 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3830960.0, ans=0.125 2023-11-29 05:31:56,868 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.520e+01 8.911e+01 9.563e+01 1.027e+02 1.260e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-29 05:32:04,151 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3831093.3333333335, ans=0.0 2023-11-29 05:32:13,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3831093.3333333335, ans=0.1 2023-11-29 05:32:13,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff3.min_abs, batch_count=3831093.3333333335, ans=0.2 2023-11-29 05:32:15,831 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 9550, loss[loss=0.0769, simple_loss=0.1039, pruned_loss=0.01673, audio_tagging_loss=0.008208, over 14408.00 frames. ], tot_loss[loss=0.06504, simple_loss=0.08884, pruned_loss=0.01193, audio_tagging_loss=0.008686, over 3044403.60 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 05:32:27,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3831226.6666666665, ans=0.0 2023-11-29 05:32:29,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3831226.6666666665, ans=0.125 2023-11-29 05:32:33,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3831226.6666666665, ans=0.125 2023-11-29 05:32:37,224 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.67 vs. limit=22.5 2023-11-29 05:32:45,022 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 574700 2023-11-29 05:33:08,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff3.min_abs, batch_count=3831426.6666666665, ans=0.2 2023-11-29 05:33:17,854 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 9600, loss[loss=0.0783, simple_loss=0.1031, pruned_loss=0.01686, audio_tagging_loss=0.009909, over 14412.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.08926, pruned_loss=0.01206, audio_tagging_loss=0.00872, over 3047353.04 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:33:39,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3831560.0, ans=0.1 2023-11-29 05:33:40,690 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.73 vs. limit=15.0 2023-11-29 05:33:43,173 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.50 vs. limit=22.5 2023-11-29 05:33:46,179 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 574750 2023-11-29 05:34:01,054 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.668e+01 9.154e+01 9.787e+01 1.038e+02 1.402e+02, threshold=1.957e+02, percent-clipped=0.0 2023-11-29 05:34:11,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3831760.0, ans=0.125 2023-11-29 05:34:18,808 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 9650, loss[loss=0.0781, simple_loss=0.1077, pruned_loss=0.01519, audio_tagging_loss=0.009041, over 14994.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08904, pruned_loss=0.01202, audio_tagging_loss=0.008776, over 3051446.82 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:34:19,476 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.93 vs. limit=22.5 2023-11-29 05:34:27,109 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3831826.6666666665, ans=0.1 2023-11-29 05:34:50,470 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 574800 2023-11-29 05:34:54,330 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3831960.0, ans=0.2 2023-11-29 05:35:07,298 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.35 vs. limit=15.0 2023-11-29 05:35:11,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3832093.3333333335, ans=0.0 2023-11-29 05:35:20,960 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 9700, loss[loss=0.05948, simple_loss=0.07836, pruned_loss=0.01223, audio_tagging_loss=0.008066, over 15187.00 frames. ], tot_loss[loss=0.06504, simple_loss=0.08892, pruned_loss=0.01192, audio_tagging_loss=0.008655, over 3047648.67 frames. ], batch size: 60, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:35:28,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3832160.0, ans=0.0 2023-11-29 05:35:34,517 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.73 vs. limit=10.0 2023-11-29 05:35:39,789 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.37 vs. limit=6.0 2023-11-29 05:35:50,742 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 574850 2023-11-29 05:36:03,817 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.548e+01 9.065e+01 9.811e+01 1.054e+02 1.349e+02, threshold=1.962e+02, percent-clipped=0.0 2023-11-29 05:36:10,953 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3832426.6666666665, ans=0.1 2023-11-29 05:36:20,614 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.70 vs. limit=15.0 2023-11-29 05:36:23,058 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 9750, loss[loss=0.06496, simple_loss=0.08039, pruned_loss=0.01422, audio_tagging_loss=0.01054, over 14728.00 frames. ], tot_loss[loss=0.065, simple_loss=0.08896, pruned_loss=0.01194, audio_tagging_loss=0.008572, over 3043540.12 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:36:23,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3832493.3333333335, ans=0.1 2023-11-29 05:36:42,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3832560.0, ans=0.125 2023-11-29 05:36:51,757 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 574900 2023-11-29 05:37:13,636 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3832760.0, ans=0.125 2023-11-29 05:37:23,668 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 9800, loss[loss=0.05986, simple_loss=0.08009, pruned_loss=0.008918, audio_tagging_loss=0.01089, over 15107.00 frames. ], tot_loss[loss=0.06504, simple_loss=0.08923, pruned_loss=0.01194, audio_tagging_loss=0.008494, over 3037243.53 frames. ], batch size: 61, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:37:40,994 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.42 vs. limit=6.0 2023-11-29 05:37:41,564 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3832893.3333333335, ans=0.0 2023-11-29 05:37:41,601 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3832893.3333333335, ans=0.125 2023-11-29 05:37:41,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3832893.3333333335, ans=0.125 2023-11-29 05:37:43,993 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=3832893.3333333335, ans=0.05 2023-11-29 05:37:52,480 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 574950 2023-11-29 05:38:00,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3833026.6666666665, ans=0.1 2023-11-29 05:38:05,615 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.528e+01 9.329e+01 9.816e+01 1.069e+02 1.352e+02, threshold=1.963e+02, percent-clipped=0.0 2023-11-29 05:38:20,071 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 05:38:23,378 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 9850, loss[loss=0.06863, simple_loss=0.08352, pruned_loss=0.01687, audio_tagging_loss=0.01001, over 14374.00 frames. ], tot_loss[loss=0.06507, simple_loss=0.0894, pruned_loss=0.01195, audio_tagging_loss=0.008425, over 3045266.02 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:38:24,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3833160.0, ans=0.125 2023-11-29 05:38:35,525 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.08 vs. limit=22.5 2023-11-29 05:38:37,079 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.45 vs. limit=15.0 2023-11-29 05:38:52,295 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.33 vs. limit=15.0 2023-11-29 05:38:53,111 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 575000 2023-11-29 05:39:06,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3833360.0, ans=0.2 2023-11-29 05:39:24,242 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 9900, loss[loss=0.06564, simple_loss=0.08926, pruned_loss=0.01226, audio_tagging_loss=0.00875, over 15294.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.09001, pruned_loss=0.01199, audio_tagging_loss=0.008331, over 3050803.02 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:39:32,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3833493.3333333335, ans=0.0 2023-11-29 05:39:41,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3833560.0, ans=0.125 2023-11-29 05:39:44,376 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.43 vs. limit=6.0 2023-11-29 05:39:45,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3833560.0, ans=0.125 2023-11-29 05:39:46,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3833560.0, ans=0.125 2023-11-29 05:39:53,575 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 575050 2023-11-29 05:39:58,234 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3833626.6666666665, ans=0.0 2023-11-29 05:40:05,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3833693.3333333335, ans=0.2 2023-11-29 05:40:06,135 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.137e+01 9.333e+01 9.894e+01 1.049e+02 1.495e+02, threshold=1.979e+02, percent-clipped=0.0 2023-11-29 05:40:16,751 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.25 vs. limit=15.0 2023-11-29 05:40:17,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3833760.0, ans=0.125 2023-11-29 05:40:25,381 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 9950, loss[loss=0.06099, simple_loss=0.08197, pruned_loss=0.01053, audio_tagging_loss=0.009483, over 14792.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.09049, pruned_loss=0.01216, audio_tagging_loss=0.008327, over 3049343.50 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:40:49,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3833960.0, ans=0.09899494936611666 2023-11-29 05:40:53,932 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 575100 2023-11-29 05:41:07,271 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3834026.6666666665, ans=0.125 2023-11-29 05:41:12,396 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.10 vs. limit=6.0 2023-11-29 05:41:16,604 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3834093.3333333335, ans=0.025 2023-11-29 05:41:21,646 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.30 vs. limit=15.0 2023-11-29 05:41:25,618 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 10000, loss[loss=0.06077, simple_loss=0.08099, pruned_loss=0.01058, audio_tagging_loss=0.009702, over 16080.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.09024, pruned_loss=0.012, audio_tagging_loss=0.008294, over 3047206.03 frames. ], batch size: 63, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 05:41:32,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3834160.0, ans=0.125 2023-11-29 05:41:43,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3834226.6666666665, ans=0.125 2023-11-29 05:41:55,789 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 575150 2023-11-29 05:42:00,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3834293.3333333335, ans=0.1 2023-11-29 05:42:05,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3834360.0, ans=0.0 2023-11-29 05:42:08,201 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.189e+01 8.882e+01 9.475e+01 1.008e+02 1.351e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-29 05:42:12,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3834360.0, ans=0.025 2023-11-29 05:42:26,225 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 10050, loss[loss=0.07722, simple_loss=0.1124, pruned_loss=0.01336, audio_tagging_loss=0.007692, over 14161.00 frames. ], tot_loss[loss=0.06515, simple_loss=0.08957, pruned_loss=0.01195, audio_tagging_loss=0.00842, over 3039584.95 frames. ], batch size: 52, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 05:42:42,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3834560.0, ans=0.125 2023-11-29 05:42:43,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3834560.0, ans=0.025 2023-11-29 05:42:55,697 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 575200 2023-11-29 05:43:06,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3834693.3333333335, ans=10.0 2023-11-29 05:43:14,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3834760.0, ans=0.0 2023-11-29 05:43:28,455 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 10100, loss[loss=0.08749, simple_loss=0.1277, pruned_loss=0.01783, audio_tagging_loss=0.005816, over 15311.00 frames. ], tot_loss[loss=0.06538, simple_loss=0.08998, pruned_loss=0.01202, audio_tagging_loss=0.008372, over 3042072.30 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 05:43:37,931 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3834826.6666666665, ans=0.0 2023-11-29 05:43:37,953 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3834826.6666666665, ans=0.125 2023-11-29 05:43:53,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3834960.0, ans=0.125 2023-11-29 05:43:54,101 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.39 vs. limit=15.0 2023-11-29 05:43:56,912 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 575250 2023-11-29 05:44:05,534 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.80 vs. limit=15.0 2023-11-29 05:44:10,714 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.549e+01 9.243e+01 9.943e+01 1.062e+02 1.322e+02, threshold=1.989e+02, percent-clipped=0.0 2023-11-29 05:44:20,578 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 05:44:20,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3835093.3333333335, ans=0.0 2023-11-29 05:44:21,894 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3835093.3333333335, ans=0.125 2023-11-29 05:44:28,588 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 10150, loss[loss=0.06346, simple_loss=0.08095, pruned_loss=0.01322, audio_tagging_loss=0.009762, over 15762.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08985, pruned_loss=0.01218, audio_tagging_loss=0.008426, over 3045406.85 frames. ], batch size: 62, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 05:44:30,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3835160.0, ans=0.025 2023-11-29 05:44:40,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3835226.6666666665, ans=0.125 2023-11-29 05:44:57,819 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 575300 2023-11-29 05:45:00,048 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 05:45:06,713 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.45 vs. limit=22.5 2023-11-29 05:45:28,668 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 10200, loss[loss=0.05234, simple_loss=0.07048, pruned_loss=0.006339, audio_tagging_loss=0.01076, over 14680.00 frames. ], tot_loss[loss=0.06504, simple_loss=0.08908, pruned_loss=0.01196, audio_tagging_loss=0.008545, over 3052836.63 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 05:45:36,457 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.80 vs. limit=15.0 2023-11-29 05:45:42,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3835560.0, ans=0.1 2023-11-29 05:45:43,967 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 05:45:53,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3835626.6666666665, ans=0.0 2023-11-29 05:45:54,727 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 05:45:58,190 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 575350 2023-11-29 05:46:12,472 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.929e+01 8.881e+01 9.698e+01 1.024e+02 1.374e+02, threshold=1.940e+02, percent-clipped=0.0 2023-11-29 05:46:16,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3835760.0, ans=0.125 2023-11-29 05:46:24,138 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3835760.0, ans=10.0 2023-11-29 05:46:29,774 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 10250, loss[loss=0.07993, simple_loss=0.1005, pruned_loss=0.01921, audio_tagging_loss=0.01047, over 15025.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.08997, pruned_loss=0.01215, audio_tagging_loss=0.008579, over 3058404.95 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:46:37,656 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3835826.6666666665, ans=0.0 2023-11-29 05:46:49,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3835893.3333333335, ans=0.125 2023-11-29 05:46:50,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3835893.3333333335, ans=0.0 2023-11-29 05:46:58,955 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 575400 2023-11-29 05:47:28,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3836093.3333333335, ans=0.125 2023-11-29 05:47:30,871 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 10300, loss[loss=0.0775, simple_loss=0.1115, pruned_loss=0.01458, audio_tagging_loss=0.007185, over 15680.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.09051, pruned_loss=0.01229, audio_tagging_loss=0.008575, over 3051040.86 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:47:38,582 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3836160.0, ans=0.125 2023-11-29 05:47:46,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3836226.6666666665, ans=0.0 2023-11-29 05:47:47,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3836226.6666666665, ans=0.0 2023-11-29 05:48:00,393 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 575450 2023-11-29 05:48:11,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3836360.0, ans=0.1 2023-11-29 05:48:14,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3836360.0, ans=0.125 2023-11-29 05:48:14,983 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.637e+01 9.198e+01 9.831e+01 1.050e+02 1.376e+02, threshold=1.966e+02, percent-clipped=0.0 2023-11-29 05:48:31,818 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 10350, loss[loss=0.06415, simple_loss=0.08779, pruned_loss=0.01151, audio_tagging_loss=0.008741, over 16796.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.0895, pruned_loss=0.01211, audio_tagging_loss=0.008682, over 3051258.60 frames. ], batch size: 65, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:48:43,882 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.27 vs. limit=15.0 2023-11-29 05:49:00,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3836626.6666666665, ans=0.09899494936611666 2023-11-29 05:49:01,320 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 575500 2023-11-29 05:49:05,838 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3836626.6666666665, ans=0.125 2023-11-29 05:49:15,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3836693.3333333335, ans=0.1 2023-11-29 05:49:31,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3836826.6666666665, ans=0.125 2023-11-29 05:49:31,951 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 10400, loss[loss=0.04797, simple_loss=0.06203, pruned_loss=0.008388, audio_tagging_loss=0.008561, over 14479.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08927, pruned_loss=0.01206, audio_tagging_loss=0.008748, over 3047356.32 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 05:49:35,082 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.33 vs. limit=15.0 2023-11-29 05:50:00,920 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 575550 2023-11-29 05:50:02,324 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3836960.0, ans=0.125 2023-11-29 05:50:15,789 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.305e+01 9.181e+01 9.898e+01 1.077e+02 1.252e+02, threshold=1.980e+02, percent-clipped=0.0 2023-11-29 05:50:18,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3837026.6666666665, ans=0.07 2023-11-29 05:50:31,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3837160.0, ans=0.1 2023-11-29 05:50:32,051 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 10450, loss[loss=0.06483, simple_loss=0.08867, pruned_loss=0.01225, audio_tagging_loss=0.008243, over 14692.00 frames. ], tot_loss[loss=0.065, simple_loss=0.0889, pruned_loss=0.01183, audio_tagging_loss=0.008725, over 3050534.97 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 05:50:45,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3837226.6666666665, ans=0.125 2023-11-29 05:50:48,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3837226.6666666665, ans=0.125 2023-11-29 05:50:55,717 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.07 vs. limit=10.0 2023-11-29 05:51:01,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3837293.3333333335, ans=0.1 2023-11-29 05:51:02,145 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 575600 2023-11-29 05:51:06,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3837293.3333333335, ans=0.0 2023-11-29 05:51:16,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3837360.0, ans=0.125 2023-11-29 05:51:33,387 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 10500, loss[loss=0.09218, simple_loss=0.1316, pruned_loss=0.02077, audio_tagging_loss=0.005593, over 14917.00 frames. ], tot_loss[loss=0.06528, simple_loss=0.08956, pruned_loss=0.01197, audio_tagging_loss=0.00854, over 3044469.02 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 05:51:36,193 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.76 vs. limit=15.0 2023-11-29 05:51:38,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3837493.3333333335, ans=0.125 2023-11-29 05:51:43,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3837493.3333333335, ans=0.0 2023-11-29 05:52:01,964 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 575650 2023-11-29 05:52:05,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3837626.6666666665, ans=0.125 2023-11-29 05:52:13,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3837693.3333333335, ans=0.0 2023-11-29 05:52:16,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3837693.3333333335, ans=0.125 2023-11-29 05:52:16,854 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.176e+01 9.059e+01 9.610e+01 1.014e+02 2.042e+02, threshold=1.922e+02, percent-clipped=1.0 2023-11-29 05:52:23,981 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3837760.0, ans=0.1 2023-11-29 05:52:32,007 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3837760.0, ans=0.09899494936611666 2023-11-29 05:52:34,009 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 10550, loss[loss=0.08124, simple_loss=0.1123, pruned_loss=0.01747, audio_tagging_loss=0.007615, over 15760.00 frames. ], tot_loss[loss=0.06486, simple_loss=0.08918, pruned_loss=0.01186, audio_tagging_loss=0.008412, over 3036139.72 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 05:52:41,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3837826.6666666665, ans=0.125 2023-11-29 05:52:44,742 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3837893.3333333335, ans=0.0 2023-11-29 05:52:53,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3837893.3333333335, ans=0.0 2023-11-29 05:52:54,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3837893.3333333335, ans=0.0 2023-11-29 05:53:03,022 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 575700 2023-11-29 05:53:30,827 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 05:53:34,038 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 10600, loss[loss=0.06175, simple_loss=0.0811, pruned_loss=0.01135, audio_tagging_loss=0.009847, over 15165.00 frames. ], tot_loss[loss=0.06504, simple_loss=0.08931, pruned_loss=0.01205, audio_tagging_loss=0.008338, over 3038063.49 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:54:04,243 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 575750 2023-11-29 05:54:10,113 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3838360.0, ans=0.09899494936611666 2023-11-29 05:54:18,874 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.639e+01 9.196e+01 9.659e+01 1.047e+02 1.317e+02, threshold=1.932e+02, percent-clipped=0.0 2023-11-29 05:54:28,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3838426.6666666665, ans=0.0 2023-11-29 05:54:34,354 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.45 vs. limit=15.0 2023-11-29 05:54:34,916 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 10650, loss[loss=0.06584, simple_loss=0.09746, pruned_loss=0.0103, audio_tagging_loss=0.006808, over 15458.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08958, pruned_loss=0.01215, audio_tagging_loss=0.008383, over 3045277.05 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:54:54,180 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.10 vs. limit=6.0 2023-11-29 05:55:03,703 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 575800 2023-11-29 05:55:18,821 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.30 vs. limit=22.5 2023-11-29 05:55:36,133 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 10700, loss[loss=0.06736, simple_loss=0.09166, pruned_loss=0.01304, audio_tagging_loss=0.008485, over 14788.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.09013, pruned_loss=0.01217, audio_tagging_loss=0.00837, over 3046372.42 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:56:04,221 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 575850 2023-11-29 05:56:21,882 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.276e+01 8.915e+01 9.906e+01 1.068e+02 1.666e+02, threshold=1.981e+02, percent-clipped=0.0 2023-11-29 05:56:25,670 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3839093.3333333335, ans=0.1 2023-11-29 05:56:26,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3839093.3333333335, ans=0.0 2023-11-29 05:56:35,685 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 10750, loss[loss=0.05987, simple_loss=0.08939, pruned_loss=0.01049, audio_tagging_loss=0.004688, over 14756.00 frames. ], tot_loss[loss=0.06503, simple_loss=0.08937, pruned_loss=0.01197, audio_tagging_loss=0.008378, over 3053671.38 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 05:57:05,265 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 575900 2023-11-29 05:57:05,770 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.93 vs. limit=15.0 2023-11-29 05:57:08,759 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.91 vs. limit=15.0 2023-11-29 05:57:35,359 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3839493.3333333335, ans=0.125 2023-11-29 05:57:36,387 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 10800, loss[loss=0.07322, simple_loss=0.09848, pruned_loss=0.01662, audio_tagging_loss=0.007356, over 15121.00 frames. ], tot_loss[loss=0.06479, simple_loss=0.08896, pruned_loss=0.01195, audio_tagging_loss=0.008355, over 3052787.42 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:57:37,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3839493.3333333335, ans=0.125 2023-11-29 05:57:39,389 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.76 vs. limit=12.0 2023-11-29 05:57:50,365 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.10 vs. limit=15.0 2023-11-29 05:58:04,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3839626.6666666665, ans=0.0 2023-11-29 05:58:04,887 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 575950 2023-11-29 05:58:21,347 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.529e+01 8.991e+01 9.894e+01 1.056e+02 1.415e+02, threshold=1.979e+02, percent-clipped=0.0 2023-11-29 05:58:30,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3839760.0, ans=0.2 2023-11-29 05:58:34,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3839760.0, ans=0.125 2023-11-29 05:58:37,108 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 10850, loss[loss=0.06793, simple_loss=0.09186, pruned_loss=0.01348, audio_tagging_loss=0.008522, over 15875.00 frames. ], tot_loss[loss=0.06477, simple_loss=0.08893, pruned_loss=0.01187, audio_tagging_loss=0.008428, over 3052240.67 frames. ], batch size: 60, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:58:38,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3839826.6666666665, ans=0.0 2023-11-29 05:59:05,491 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 576000 2023-11-29 05:59:07,334 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3839960.0, ans=0.125 2023-11-29 05:59:39,153 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 05:59:40,193 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 10900, loss[loss=0.07697, simple_loss=0.1066, pruned_loss=0.01803, audio_tagging_loss=0.005638, over 15596.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08991, pruned_loss=0.01203, audio_tagging_loss=0.008462, over 3053435.24 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 06:00:08,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3840293.3333333335, ans=0.0 2023-11-29 06:00:09,815 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 576050 2023-11-29 06:00:10,618 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.91 vs. limit=15.0 2023-11-29 06:00:27,256 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.719e+01 8.980e+01 9.608e+01 1.052e+02 1.228e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-29 06:00:27,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3840360.0, ans=0.0 2023-11-29 06:00:41,327 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 10950, loss[loss=0.06117, simple_loss=0.085, pruned_loss=0.01061, audio_tagging_loss=0.008058, over 15661.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.09016, pruned_loss=0.01196, audio_tagging_loss=0.008505, over 3049690.59 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 06:00:43,856 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.04 vs. limit=22.5 2023-11-29 06:00:48,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3840493.3333333335, ans=0.0 2023-11-29 06:01:12,291 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 576100 2023-11-29 06:01:13,640 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3840626.6666666665, ans=0.125 2023-11-29 06:01:16,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3840626.6666666665, ans=0.5 2023-11-29 06:01:24,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3840693.3333333335, ans=0.0 2023-11-29 06:01:30,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3840760.0, ans=0.125 2023-11-29 06:01:33,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3840760.0, ans=0.0 2023-11-29 06:01:36,618 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3840760.0, ans=0.0 2023-11-29 06:01:43,989 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 11000, loss[loss=0.05542, simple_loss=0.06742, pruned_loss=0.00998, audio_tagging_loss=0.01173, over 16398.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.08931, pruned_loss=0.01194, audio_tagging_loss=0.008619, over 3049804.06 frames. ], batch size: 64, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 06:01:56,974 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 06:01:58,908 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.68 vs. limit=15.0 2023-11-29 06:02:03,110 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3840893.3333333335, ans=0.125 2023-11-29 06:02:07,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3840960.0, ans=0.0 2023-11-29 06:02:13,261 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 576150 2023-11-29 06:02:30,126 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.421e+01 8.865e+01 9.537e+01 1.028e+02 1.366e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-29 06:02:33,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3841093.3333333335, ans=0.0 2023-11-29 06:02:34,015 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.52 vs. limit=15.0 2023-11-29 06:02:42,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff2.min_abs, batch_count=3841093.3333333335, ans=0.1 2023-11-29 06:02:45,446 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 11050, loss[loss=0.08064, simple_loss=0.1188, pruned_loss=0.01625, audio_tagging_loss=0.004964, over 16579.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08967, pruned_loss=0.01192, audio_tagging_loss=0.00863, over 3056082.62 frames. ], batch size: 60, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 06:02:53,075 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.07 vs. limit=15.0 2023-11-29 06:02:57,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3841226.6666666665, ans=0.0 2023-11-29 06:02:59,915 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 06:03:04,558 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.98 vs. limit=15.0 2023-11-29 06:03:12,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3841293.3333333335, ans=0.125 2023-11-29 06:03:13,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3841293.3333333335, ans=0.125 2023-11-29 06:03:14,279 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 576200 2023-11-29 06:03:15,827 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.74 vs. limit=12.0 2023-11-29 06:03:17,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3841293.3333333335, ans=0.2 2023-11-29 06:03:22,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3841360.0, ans=0.0 2023-11-29 06:03:47,122 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 11100, loss[loss=0.06999, simple_loss=0.09381, pruned_loss=0.01662, audio_tagging_loss=0.006462, over 13392.00 frames. ], tot_loss[loss=0.06538, simple_loss=0.0892, pruned_loss=0.01197, audio_tagging_loss=0.008807, over 3055734.34 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 06:03:55,072 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3841493.3333333335, ans=0.0 2023-11-29 06:03:56,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3841493.3333333335, ans=0.125 2023-11-29 06:04:08,754 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3841560.0, ans=0.125 2023-11-29 06:04:10,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3841560.0, ans=0.0 2023-11-29 06:04:16,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3841626.6666666665, ans=0.0 2023-11-29 06:04:17,590 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 576250 2023-11-29 06:04:25,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3841693.3333333335, ans=0.125 2023-11-29 06:04:33,962 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.599e+01 9.155e+01 9.764e+01 1.030e+02 1.216e+02, threshold=1.953e+02, percent-clipped=0.0 2023-11-29 06:04:38,376 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3841760.0, ans=0.05 2023-11-29 06:04:38,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3841760.0, ans=0.1 2023-11-29 06:04:40,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3841760.0, ans=0.2 2023-11-29 06:04:47,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3841826.6666666665, ans=0.1 2023-11-29 06:04:48,691 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 11150, loss[loss=0.05503, simple_loss=0.0791, pruned_loss=0.006168, audio_tagging_loss=0.009311, over 14764.00 frames. ], tot_loss[loss=0.06487, simple_loss=0.08843, pruned_loss=0.01175, audio_tagging_loss=0.008902, over 3051273.96 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 06:04:50,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3841826.6666666665, ans=0.0 2023-11-29 06:04:55,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3841826.6666666665, ans=0.125 2023-11-29 06:05:10,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3841893.3333333335, ans=0.1 2023-11-29 06:05:17,743 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3841960.0, ans=0.125 2023-11-29 06:05:18,593 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 576300 2023-11-29 06:05:35,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3842026.6666666665, ans=0.125 2023-11-29 06:05:37,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3842093.3333333335, ans=0.125 2023-11-29 06:05:42,265 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.82 vs. limit=15.0 2023-11-29 06:05:42,351 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.98 vs. limit=15.0 2023-11-29 06:05:47,490 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.15 vs. limit=15.0 2023-11-29 06:05:51,044 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 11200, loss[loss=0.0592, simple_loss=0.08115, pruned_loss=0.01045, audio_tagging_loss=0.008179, over 16032.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08868, pruned_loss=0.01204, audio_tagging_loss=0.00891, over 3044238.16 frames. ], batch size: 61, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 06:05:57,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3842160.0, ans=0.0 2023-11-29 06:06:02,048 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.80 vs. limit=15.0 2023-11-29 06:06:19,526 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 576350 2023-11-29 06:06:23,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3842293.3333333335, ans=0.1 2023-11-29 06:06:32,113 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3842360.0, ans=0.015 2023-11-29 06:06:37,275 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.808e+01 8.953e+01 9.671e+01 1.033e+02 1.680e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-29 06:06:51,554 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 11250, loss[loss=0.06947, simple_loss=0.1026, pruned_loss=0.01142, audio_tagging_loss=0.006771, over 16029.00 frames. ], tot_loss[loss=0.065, simple_loss=0.08831, pruned_loss=0.01199, audio_tagging_loss=0.00886, over 3052108.92 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 06:07:06,529 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3842560.0, ans=0.07 2023-11-29 06:07:09,072 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.78 vs. limit=15.0 2023-11-29 06:07:21,429 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 576400 2023-11-29 06:07:42,171 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3842760.0, ans=0.125 2023-11-29 06:07:53,073 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 11300, loss[loss=0.06904, simple_loss=0.09425, pruned_loss=0.013, audio_tagging_loss=0.008922, over 14494.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08902, pruned_loss=0.0121, audio_tagging_loss=0.008687, over 3050857.91 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 06:07:56,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3842826.6666666665, ans=0.125 2023-11-29 06:08:23,079 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 576450 2023-11-29 06:08:26,784 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3842960.0, ans=0.09899494936611666 2023-11-29 06:08:34,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3843026.6666666665, ans=0.125 2023-11-29 06:08:42,212 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.057e+01 9.108e+01 9.647e+01 1.054e+02 1.325e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-29 06:08:55,269 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 11350, loss[loss=0.07988, simple_loss=0.1055, pruned_loss=0.01639, audio_tagging_loss=0.01072, over 16592.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.08869, pruned_loss=0.012, audio_tagging_loss=0.008559, over 3046888.48 frames. ], batch size: 62, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 06:09:24,788 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 576500 2023-11-29 06:09:27,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3843293.3333333335, ans=0.125 2023-11-29 06:09:29,732 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=3843293.3333333335, ans=15.0 2023-11-29 06:09:36,293 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.65 vs. limit=22.5 2023-11-29 06:09:44,921 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.18 vs. limit=15.0 2023-11-29 06:09:56,518 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 11400, loss[loss=0.05894, simple_loss=0.07644, pruned_loss=0.01009, audio_tagging_loss=0.01063, over 15023.00 frames. ], tot_loss[loss=0.06485, simple_loss=0.08883, pruned_loss=0.0119, audio_tagging_loss=0.008534, over 3048915.55 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 06:10:14,919 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3843560.0, ans=0.125 2023-11-29 06:10:26,238 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 576550 2023-11-29 06:10:34,459 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3843693.3333333335, ans=0.125 2023-11-29 06:10:45,569 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.262e+01 9.236e+01 1.000e+02 1.071e+02 1.321e+02, threshold=2.000e+02, percent-clipped=0.0 2023-11-29 06:10:57,926 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 11450, loss[loss=0.0463, simple_loss=0.05702, pruned_loss=0.006392, audio_tagging_loss=0.01139, over 13854.00 frames. ], tot_loss[loss=0.06445, simple_loss=0.08846, pruned_loss=0.01173, audio_tagging_loss=0.008485, over 3044699.75 frames. ], batch size: 52, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 06:11:03,302 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.65 vs. limit=15.0 2023-11-29 06:11:10,121 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3843893.3333333335, ans=0.125 2023-11-29 06:11:11,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3843893.3333333335, ans=0.0 2023-11-29 06:11:21,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3843960.0, ans=0.0 2023-11-29 06:11:23,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3843960.0, ans=0.2 2023-11-29 06:11:28,179 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 576600 2023-11-29 06:11:57,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3844093.3333333335, ans=0.2 2023-11-29 06:11:58,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3844093.3333333335, ans=0.2 2023-11-29 06:12:00,458 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 11500, loss[loss=0.06293, simple_loss=0.0848, pruned_loss=0.01268, audio_tagging_loss=0.007849, over 15628.00 frames. ], tot_loss[loss=0.06421, simple_loss=0.08787, pruned_loss=0.01178, audio_tagging_loss=0.008493, over 3047062.11 frames. ], batch size: 60, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 06:12:03,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3844160.0, ans=0.1 2023-11-29 06:12:10,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3844160.0, ans=0.125 2023-11-29 06:12:22,798 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.79 vs. limit=12.0 2023-11-29 06:12:30,176 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 576650 2023-11-29 06:12:50,856 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.537e+01 8.910e+01 9.568e+01 1.052e+02 1.642e+02, threshold=1.914e+02, percent-clipped=0.0 2023-11-29 06:12:57,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3844426.6666666665, ans=0.125 2023-11-29 06:13:02,165 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 11550, loss[loss=0.08122, simple_loss=0.1122, pruned_loss=0.01534, audio_tagging_loss=0.009773, over 15827.00 frames. ], tot_loss[loss=0.06457, simple_loss=0.08828, pruned_loss=0.01191, audio_tagging_loss=0.008512, over 3042647.25 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 06:13:02,742 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.14 vs. limit=15.0 2023-11-29 06:13:23,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3844560.0, ans=0.035 2023-11-29 06:13:32,422 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 576700 2023-11-29 06:13:42,667 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 06:14:03,749 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 11600, loss[loss=0.05741, simple_loss=0.0847, pruned_loss=0.006565, audio_tagging_loss=0.008499, over 15411.00 frames. ], tot_loss[loss=0.06426, simple_loss=0.08783, pruned_loss=0.01179, audio_tagging_loss=0.008556, over 3044600.05 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 06:14:04,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3844826.6666666665, ans=0.125 2023-11-29 06:14:07,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3844826.6666666665, ans=0.0 2023-11-29 06:14:07,752 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.72 vs. limit=12.0 2023-11-29 06:14:15,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3844893.3333333335, ans=0.1 2023-11-29 06:14:23,922 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.02 vs. limit=15.0 2023-11-29 06:14:28,272 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3844960.0, ans=0.125 2023-11-29 06:14:32,689 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 576750 2023-11-29 06:14:40,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3845026.6666666665, ans=0.125 2023-11-29 06:14:41,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3845026.6666666665, ans=0.07 2023-11-29 06:14:55,113 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.877e+01 9.031e+01 9.516e+01 1.044e+02 1.307e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-29 06:15:05,687 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 11650, loss[loss=0.08766, simple_loss=0.1321, pruned_loss=0.01508, audio_tagging_loss=0.006544, over 16081.00 frames. ], tot_loss[loss=0.06488, simple_loss=0.08899, pruned_loss=0.01192, audio_tagging_loss=0.008468, over 3052171.28 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 06:15:16,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3845226.6666666665, ans=0.125 2023-11-29 06:15:20,603 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.79 vs. limit=15.0 2023-11-29 06:15:33,240 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3845293.3333333335, ans=0.0 2023-11-29 06:15:35,346 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 576800 2023-11-29 06:15:39,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3845293.3333333335, ans=0.125 2023-11-29 06:15:58,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3845426.6666666665, ans=0.09899494936611666 2023-11-29 06:16:06,525 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.80 vs. limit=15.0 2023-11-29 06:16:07,161 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 11700, loss[loss=0.05693, simple_loss=0.07636, pruned_loss=0.01038, audio_tagging_loss=0.00837, over 14909.00 frames. ], tot_loss[loss=0.0646, simple_loss=0.08856, pruned_loss=0.01183, audio_tagging_loss=0.008491, over 3047667.74 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 06:16:13,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3845493.3333333335, ans=0.125 2023-11-29 06:16:16,584 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.24 vs. limit=15.0 2023-11-29 06:16:18,665 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3845560.0, ans=0.125 2023-11-29 06:16:18,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3845560.0, ans=0.0 2023-11-29 06:16:34,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3845626.6666666665, ans=0.125 2023-11-29 06:16:37,011 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 576850 2023-11-29 06:16:47,286 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3845693.3333333335, ans=0.2 2023-11-29 06:16:58,538 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.567e+01 8.889e+01 9.558e+01 1.009e+02 1.379e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-29 06:16:59,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3845760.0, ans=0.0 2023-11-29 06:17:09,151 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 11750, loss[loss=0.06239, simple_loss=0.08641, pruned_loss=0.01116, audio_tagging_loss=0.008022, over 15370.00 frames. ], tot_loss[loss=0.06466, simple_loss=0.08878, pruned_loss=0.01185, audio_tagging_loss=0.008419, over 3048212.40 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 06:17:12,055 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2023-11-29 06:17:19,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3845893.3333333335, ans=0.125 2023-11-29 06:17:38,168 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 576900 2023-11-29 06:17:43,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3845960.0, ans=0.125 2023-11-29 06:18:02,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3846093.3333333335, ans=0.2 2023-11-29 06:18:10,131 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 11800, loss[loss=0.04533, simple_loss=0.0539, pruned_loss=0.006651, audio_tagging_loss=0.01173, over 14638.00 frames. ], tot_loss[loss=0.06461, simple_loss=0.08865, pruned_loss=0.01181, audio_tagging_loss=0.008466, over 3044182.95 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 06:18:24,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3846226.6666666665, ans=0.125 2023-11-29 06:18:24,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3846226.6666666665, ans=0.1 2023-11-29 06:18:30,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3846226.6666666665, ans=0.125 2023-11-29 06:18:39,499 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 576950 2023-11-29 06:18:49,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3846360.0, ans=0.1 2023-11-29 06:18:56,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3846360.0, ans=0.125 2023-11-29 06:19:01,315 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.285e+01 9.085e+01 9.909e+01 1.081e+02 1.450e+02, threshold=1.982e+02, percent-clipped=0.0 2023-11-29 06:19:06,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3846426.6666666665, ans=0.0 2023-11-29 06:19:10,587 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 11850, loss[loss=0.04546, simple_loss=0.05897, pruned_loss=0.007131, audio_tagging_loss=0.008848, over 14987.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.08862, pruned_loss=0.01192, audio_tagging_loss=0.008683, over 3041732.05 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 06:19:40,349 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 577000 2023-11-29 06:19:49,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3846693.3333333335, ans=0.09899494936611666 2023-11-29 06:20:06,262 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.52 vs. limit=15.0 2023-11-29 06:20:11,117 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 11900, loss[loss=0.08829, simple_loss=0.1123, pruned_loss=0.02235, audio_tagging_loss=0.009769, over 15111.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08896, pruned_loss=0.01192, audio_tagging_loss=0.008743, over 3043290.73 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 06:20:32,880 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.38 vs. limit=15.0 2023-11-29 06:20:41,489 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 577050 2023-11-29 06:20:52,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3847026.6666666665, ans=0.125 2023-11-29 06:20:54,759 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.78 vs. limit=6.0 2023-11-29 06:21:02,977 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.818e+01 9.006e+01 9.638e+01 1.018e+02 1.407e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-29 06:21:05,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3847093.3333333335, ans=10.0 2023-11-29 06:21:09,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3847093.3333333335, ans=0.0 2023-11-29 06:21:13,615 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 11950, loss[loss=0.06797, simple_loss=0.07636, pruned_loss=0.01818, audio_tagging_loss=0.01161, over 16411.00 frames. ], tot_loss[loss=0.06458, simple_loss=0.08792, pruned_loss=0.01175, audio_tagging_loss=0.00887, over 3044752.03 frames. ], batch size: 65, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 06:21:18,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3847160.0, ans=0.0 2023-11-29 06:21:24,927 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3847226.6666666665, ans=0.1 2023-11-29 06:21:41,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3847293.3333333335, ans=0.1 2023-11-29 06:21:42,246 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 577100 2023-11-29 06:21:46,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3847293.3333333335, ans=0.0 2023-11-29 06:21:53,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3847360.0, ans=0.125 2023-11-29 06:22:00,562 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3847426.6666666665, ans=0.0 2023-11-29 06:22:12,427 INFO [train_asr.py:1235] (1/4) Epoch 48, batch 12000, loss[loss=0.08017, simple_loss=0.1134, pruned_loss=0.01499, audio_tagging_loss=0.008487, over 15152.00 frames. ], tot_loss[loss=0.06471, simple_loss=0.08806, pruned_loss=0.01177, audio_tagging_loss=0.008909, over 3051392.48 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 06:22:12,428 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-29 06:22:52,515 INFO [train_asr.py:1267] (1/4) Epoch 48, validation: loss=0.05839, simple_loss=0.05056, pruned_loss=0.005496, audio_tagging_loss=0.02761, over 4681554.00 frames. 2023-11-29 06:22:52,515 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-29 06:23:11,294 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.64 vs. limit=15.0 2023-11-29 06:23:15,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3847626.6666666665, ans=0.2 2023-11-29 06:23:18,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3847626.6666666665, ans=0.0 2023-11-29 06:23:44,137 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 0, loss[loss=0.08592, simple_loss=0.1026, pruned_loss=0.01622, audio_tagging_loss=0.01839, over 16024.00 frames. ], tot_loss[loss=0.08592, simple_loss=0.1026, pruned_loss=0.01622, audio_tagging_loss=0.01839, over 16024.00 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 06:23:44,138 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-29 06:24:02,219 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.8358, 5.8736, 5.9102, 5.9221], device='cuda:1') 2023-11-29 06:24:12,655 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.3365, 4.3065, 4.4792, 4.4498], device='cuda:1') 2023-11-29 06:24:16,651 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.8579, 4.9603, 5.1638, 4.9314], device='cuda:1') 2023-11-29 06:24:20,372 INFO [train_asr.py:1267] (1/4) Epoch 49, validation: loss=0.05827, simple_loss=0.05045, pruned_loss=0.005376, audio_tagging_loss=0.02767, over 4681554.00 frames. 2023-11-29 06:24:20,382 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-29 06:24:20,502 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 577150 2023-11-29 06:24:20,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3847653.3333333335, ans=0.09899494936611666 2023-11-29 06:24:21,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3847653.3333333335, ans=0.2 2023-11-29 06:24:38,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3847720.0, ans=0.0 2023-11-29 06:24:42,844 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.794e+01 9.193e+01 9.994e+01 1.113e+02 1.489e+02, threshold=1.999e+02, percent-clipped=0.0 2023-11-29 06:24:44,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3847786.6666666665, ans=0.2 2023-11-29 06:24:59,278 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.87 vs. limit=15.0 2023-11-29 06:25:07,107 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.12 vs. limit=15.0 2023-11-29 06:25:22,987 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 50, loss[loss=0.07064, simple_loss=0.09387, pruned_loss=0.01218, audio_tagging_loss=0.01152, over 14740.00 frames. ], tot_loss[loss=0.07303, simple_loss=0.0896, pruned_loss=0.01163, audio_tagging_loss=0.0166, over 693630.54 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 06:25:23,093 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 577200 2023-11-29 06:26:07,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3848186.6666666665, ans=0.125 2023-11-29 06:26:11,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3848186.6666666665, ans=0.1 2023-11-29 06:26:12,172 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3848253.3333333335, ans=0.125 2023-11-29 06:26:25,063 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 100, loss[loss=0.08821, simple_loss=0.1185, pruned_loss=0.01866, audio_tagging_loss=0.01028, over 14426.00 frames. ], tot_loss[loss=0.07329, simple_loss=0.09026, pruned_loss=0.01216, audio_tagging_loss=0.016, over 1216445.89 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:26:25,153 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 577250 2023-11-29 06:26:27,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3848320.0, ans=0.0 2023-11-29 06:26:37,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3848386.6666666665, ans=0.125 2023-11-29 06:26:49,325 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.773e+01 9.815e+01 1.050e+02 1.112e+02 1.329e+02, threshold=2.101e+02, percent-clipped=0.0 2023-11-29 06:27:08,504 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.83 vs. limit=15.0 2023-11-29 06:27:27,378 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 150, loss[loss=0.06593, simple_loss=0.09352, pruned_loss=0.0103, audio_tagging_loss=0.008868, over 15188.00 frames. ], tot_loss[loss=0.07228, simple_loss=0.09156, pruned_loss=0.01227, audio_tagging_loss=0.01423, over 1628608.27 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:27:27,510 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 577300 2023-11-29 06:27:35,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3848653.3333333335, ans=0.1 2023-11-29 06:27:46,828 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.86 vs. limit=6.0 2023-11-29 06:28:08,177 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.62 vs. limit=6.0 2023-11-29 06:28:31,055 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 200, loss[loss=0.08243, simple_loss=0.1082, pruned_loss=0.01746, audio_tagging_loss=0.01087, over 15031.00 frames. ], tot_loss[loss=0.07073, simple_loss=0.09158, pruned_loss=0.01229, audio_tagging_loss=0.01264, over 1935665.94 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:28:31,173 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 577350 2023-11-29 06:28:42,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3849053.3333333335, ans=0.125 2023-11-29 06:28:53,802 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.143e+01 9.385e+01 9.861e+01 1.084e+02 1.515e+02, threshold=1.972e+02, percent-clipped=0.0 2023-11-29 06:29:03,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3849120.0, ans=0.125 2023-11-29 06:29:10,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3849186.6666666665, ans=0.125 2023-11-29 06:29:31,535 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 250, loss[loss=0.07564, simple_loss=0.1106, pruned_loss=0.01392, audio_tagging_loss=0.006423, over 14506.00 frames. ], tot_loss[loss=0.0699, simple_loss=0.09234, pruned_loss=0.01238, audio_tagging_loss=0.01134, over 2180066.40 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:29:31,648 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 577400 2023-11-29 06:29:41,581 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3849320.0, ans=0.0 2023-11-29 06:29:47,351 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3849386.6666666665, ans=0.07 2023-11-29 06:30:04,488 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3849453.3333333335, ans=0.1 2023-11-29 06:30:04,695 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.51 vs. limit=22.5 2023-11-29 06:30:05,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3849453.3333333335, ans=0.125 2023-11-29 06:30:30,100 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.16 vs. limit=15.0 2023-11-29 06:30:33,863 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.07 vs. limit=10.0 2023-11-29 06:30:34,230 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 300, loss[loss=0.06321, simple_loss=0.08865, pruned_loss=0.008185, audio_tagging_loss=0.0107, over 15505.00 frames. ], tot_loss[loss=0.06902, simple_loss=0.09239, pruned_loss=0.01241, audio_tagging_loss=0.01041, over 2371422.02 frames. ], batch size: 60, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:30:34,336 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 577450 2023-11-29 06:30:58,197 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.896e+01 9.309e+01 1.014e+02 1.083e+02 1.326e+02, threshold=2.029e+02, percent-clipped=0.0 2023-11-29 06:31:04,865 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.78 vs. limit=12.0 2023-11-29 06:31:06,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3849786.6666666665, ans=0.2 2023-11-29 06:31:14,334 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3849853.3333333335, ans=0.04949747468305833 2023-11-29 06:31:27,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3849920.0, ans=0.0 2023-11-29 06:31:29,237 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.14 vs. limit=12.0 2023-11-29 06:31:37,033 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 350, loss[loss=0.06764, simple_loss=0.09371, pruned_loss=0.01261, audio_tagging_loss=0.008173, over 15007.00 frames. ], tot_loss[loss=0.06734, simple_loss=0.09039, pruned_loss=0.0121, audio_tagging_loss=0.01004, over 2520320.24 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:31:37,124 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 577500 2023-11-29 06:32:12,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3850186.6666666665, ans=0.125 2023-11-29 06:32:32,545 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.28 vs. limit=12.0 2023-11-29 06:32:39,067 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 400, loss[loss=0.09091, simple_loss=0.1356, pruned_loss=0.01604, audio_tagging_loss=0.007044, over 16322.00 frames. ], tot_loss[loss=0.06648, simple_loss=0.08977, pruned_loss=0.01196, audio_tagging_loss=0.009626, over 2634650.44 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 06:32:39,170 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 577550 2023-11-29 06:32:44,487 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.76 vs. limit=15.0 2023-11-29 06:32:46,579 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3850320.0, ans=0.0 2023-11-29 06:32:56,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3850386.6666666665, ans=0.0 2023-11-29 06:32:57,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3850386.6666666665, ans=0.125 2023-11-29 06:33:02,413 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.275e+01 8.755e+01 9.458e+01 1.037e+02 1.447e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-29 06:33:33,326 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3850586.6666666665, ans=0.05 2023-11-29 06:33:41,956 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 450, loss[loss=0.05494, simple_loss=0.07651, pruned_loss=0.01001, audio_tagging_loss=0.006684, over 14096.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.0901, pruned_loss=0.01191, audio_tagging_loss=0.009284, over 2727737.64 frames. ], batch size: 53, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 06:33:42,052 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 577600 2023-11-29 06:33:42,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3850653.3333333335, ans=0.95 2023-11-29 06:33:46,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3850653.3333333335, ans=0.125 2023-11-29 06:33:57,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3850720.0, ans=0.1 2023-11-29 06:34:42,598 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3850920.0, ans=0.125 2023-11-29 06:34:42,695 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3850920.0, ans=0.0 2023-11-29 06:34:45,307 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 500, loss[loss=0.082, simple_loss=0.1167, pruned_loss=0.01843, audio_tagging_loss=0.005228, over 14948.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.08941, pruned_loss=0.01183, audio_tagging_loss=0.009143, over 2795082.99 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 06:34:45,464 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 577650 2023-11-29 06:34:56,543 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3851053.3333333335, ans=0.0 2023-11-29 06:35:01,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3851053.3333333335, ans=0.125 2023-11-29 06:35:09,256 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.768e+01 8.909e+01 9.530e+01 1.043e+02 1.565e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-29 06:35:14,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3851120.0, ans=0.2 2023-11-29 06:35:19,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3851120.0, ans=0.125 2023-11-29 06:35:47,389 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 550, loss[loss=0.06406, simple_loss=0.08501, pruned_loss=0.009632, audio_tagging_loss=0.01193, over 15085.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.08955, pruned_loss=0.01193, audio_tagging_loss=0.009011, over 2847604.99 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:35:47,514 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 577700 2023-11-29 06:35:50,509 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.48 vs. limit=22.5 2023-11-29 06:35:57,950 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3851386.6666666665, ans=0.125 2023-11-29 06:36:26,916 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3851520.0, ans=0.125 2023-11-29 06:36:39,811 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.78 vs. limit=15.0 2023-11-29 06:36:44,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3851586.6666666665, ans=0.125 2023-11-29 06:36:49,883 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 600, loss[loss=0.0568, simple_loss=0.07633, pruned_loss=0.01108, audio_tagging_loss=0.007547, over 14568.00 frames. ], tot_loss[loss=0.06488, simple_loss=0.08851, pruned_loss=0.0117, audio_tagging_loss=0.00893, over 2886866.53 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:36:50,001 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 577750 2023-11-29 06:36:50,910 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=5.15 vs. limit=5.0 2023-11-29 06:36:54,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3851653.3333333335, ans=0.125 2023-11-29 06:36:57,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3851653.3333333335, ans=0.0 2023-11-29 06:36:57,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3851653.3333333335, ans=0.125 2023-11-29 06:37:00,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3851720.0, ans=0.1 2023-11-29 06:37:03,330 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3851720.0, ans=0.0 2023-11-29 06:37:14,828 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.590e+01 8.849e+01 9.501e+01 1.048e+02 1.415e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-29 06:37:19,846 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3851786.6666666665, ans=0.1 2023-11-29 06:37:22,213 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3851786.6666666665, ans=0.125 2023-11-29 06:37:49,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3851920.0, ans=0.05 2023-11-29 06:37:52,648 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 650, loss[loss=0.06709, simple_loss=0.1011, pruned_loss=0.009706, audio_tagging_loss=0.006828, over 16373.00 frames. ], tot_loss[loss=0.06482, simple_loss=0.08853, pruned_loss=0.01165, audio_tagging_loss=0.008903, over 2923971.35 frames. ], batch size: 60, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:37:52,739 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 577800 2023-11-29 06:37:57,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3851986.6666666665, ans=0.125 2023-11-29 06:38:03,377 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.48 vs. limit=15.0 2023-11-29 06:38:39,525 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=3852186.6666666665, ans=0.95 2023-11-29 06:38:45,636 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3852253.3333333335, ans=0.125 2023-11-29 06:38:50,411 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3852253.3333333335, ans=0.04949747468305833 2023-11-29 06:38:55,929 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 700, loss[loss=0.06644, simple_loss=0.08617, pruned_loss=0.01363, audio_tagging_loss=0.009719, over 15420.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.09048, pruned_loss=0.01207, audio_tagging_loss=0.008767, over 2961513.14 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:38:56,032 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 577850 2023-11-29 06:39:19,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3852453.3333333335, ans=0.1 2023-11-29 06:39:20,738 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.229e+01 9.141e+01 9.968e+01 1.043e+02 1.174e+02, threshold=1.994e+02, percent-clipped=0.0 2023-11-29 06:39:23,894 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.95 vs. limit=15.0 2023-11-29 06:39:25,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3852453.3333333335, ans=0.1 2023-11-29 06:39:34,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3852520.0, ans=0.2 2023-11-29 06:39:39,362 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3852520.0, ans=0.125 2023-11-29 06:39:58,573 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 750, loss[loss=0.06258, simple_loss=0.08596, pruned_loss=0.0107, audio_tagging_loss=0.008898, over 15354.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.09047, pruned_loss=0.01207, audio_tagging_loss=0.008804, over 2979838.95 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:39:58,664 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 577900 2023-11-29 06:40:15,478 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.32 vs. limit=15.0 2023-11-29 06:40:40,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3852853.3333333335, ans=0.125 2023-11-29 06:41:00,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3852986.6666666665, ans=0.0 2023-11-29 06:41:01,474 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 800, loss[loss=0.04327, simple_loss=0.05475, pruned_loss=0.004474, audio_tagging_loss=0.01142, over 15769.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08967, pruned_loss=0.01188, audio_tagging_loss=0.008864, over 3001575.54 frames. ], batch size: 61, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 06:41:01,591 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 577950 2023-11-29 06:41:22,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3853053.3333333335, ans=0.125 2023-11-29 06:41:22,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3853053.3333333335, ans=0.125 2023-11-29 06:41:22,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3853053.3333333335, ans=0.2 2023-11-29 06:41:24,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3853053.3333333335, ans=0.125 2023-11-29 06:41:26,072 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.808e+01 9.191e+01 9.688e+01 1.032e+02 1.219e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-29 06:41:36,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3853120.0, ans=0.125 2023-11-29 06:42:04,110 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 850, loss[loss=0.0656, simple_loss=0.09045, pruned_loss=0.01435, audio_tagging_loss=0.006028, over 14505.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.08979, pruned_loss=0.01197, audio_tagging_loss=0.008856, over 3015054.43 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:42:04,204 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 578000 2023-11-29 06:42:08,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3853320.0, ans=0.0 2023-11-29 06:42:17,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3853386.6666666665, ans=0.1 2023-11-29 06:42:20,907 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3853386.6666666665, ans=0.0 2023-11-29 06:42:27,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3853386.6666666665, ans=0.125 2023-11-29 06:42:37,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3853453.3333333335, ans=0.0 2023-11-29 06:42:53,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3853586.6666666665, ans=0.0 2023-11-29 06:43:05,892 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 900, loss[loss=0.0455, simple_loss=0.05483, pruned_loss=0.008043, audio_tagging_loss=0.01004, over 15602.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08944, pruned_loss=0.01205, audio_tagging_loss=0.008878, over 3016976.23 frames. ], batch size: 62, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:43:06,005 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 578050 2023-11-29 06:43:09,090 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3853653.3333333335, ans=0.1 2023-11-29 06:43:14,864 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3853653.3333333335, ans=0.125 2023-11-29 06:43:15,211 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.64 vs. limit=15.0 2023-11-29 06:43:20,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3853720.0, ans=0.0 2023-11-29 06:43:27,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3853720.0, ans=0.125 2023-11-29 06:43:33,421 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.882e+01 9.400e+01 1.003e+02 1.065e+02 1.240e+02, threshold=2.006e+02, percent-clipped=0.0 2023-11-29 06:43:49,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3853853.3333333335, ans=0.125 2023-11-29 06:43:56,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3853920.0, ans=0.2 2023-11-29 06:43:56,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3853920.0, ans=0.0 2023-11-29 06:43:59,039 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3853920.0, ans=0.125 2023-11-29 06:44:08,755 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.81 vs. limit=15.0 2023-11-29 06:44:09,232 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 950, loss[loss=0.07734, simple_loss=0.1069, pruned_loss=0.01565, audio_tagging_loss=0.008226, over 15211.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.09035, pruned_loss=0.01218, audio_tagging_loss=0.008718, over 3028832.72 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:44:09,337 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 578100 2023-11-29 06:44:11,162 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.64 vs. limit=15.0 2023-11-29 06:44:29,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3854053.3333333335, ans=0.2 2023-11-29 06:44:30,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3854053.3333333335, ans=0.125 2023-11-29 06:44:36,521 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3854120.0, ans=0.0 2023-11-29 06:44:41,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3854120.0, ans=0.125 2023-11-29 06:44:56,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3854186.6666666665, ans=0.0 2023-11-29 06:45:01,916 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.42 vs. limit=6.0 2023-11-29 06:45:10,956 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.96 vs. limit=15.0 2023-11-29 06:45:11,245 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 1000, loss[loss=0.05545, simple_loss=0.07408, pruned_loss=0.009307, audio_tagging_loss=0.009103, over 14907.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.08997, pruned_loss=0.01203, audio_tagging_loss=0.008629, over 3028118.11 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:45:11,348 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 578150 2023-11-29 06:45:21,338 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.80 vs. limit=12.0 2023-11-29 06:45:23,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3854386.6666666665, ans=0.1 2023-11-29 06:45:37,175 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.501e+01 8.899e+01 9.614e+01 1.019e+02 1.244e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-29 06:45:39,628 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 06:46:05,649 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3854586.6666666665, ans=0.125 2023-11-29 06:46:12,505 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 1050, loss[loss=0.05251, simple_loss=0.07674, pruned_loss=0.007561, audio_tagging_loss=0.006584, over 15479.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.08958, pruned_loss=0.01197, audio_tagging_loss=0.008501, over 3032086.16 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:46:12,601 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 578200 2023-11-29 06:46:20,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3854653.3333333335, ans=0.1 2023-11-29 06:46:23,901 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3854653.3333333335, ans=0.1 2023-11-29 06:46:29,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3854720.0, ans=0.125 2023-11-29 06:46:32,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3854720.0, ans=0.125 2023-11-29 06:47:00,766 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.58 vs. limit=22.5 2023-11-29 06:47:07,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3854920.0, ans=0.2 2023-11-29 06:47:09,124 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3854920.0, ans=0.125 2023-11-29 06:47:15,129 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 1100, loss[loss=0.05276, simple_loss=0.065, pruned_loss=0.009655, audio_tagging_loss=0.0106, over 14143.00 frames. ], tot_loss[loss=0.06501, simple_loss=0.08908, pruned_loss=0.01197, audio_tagging_loss=0.008496, over 3036935.92 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:47:15,213 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 578250 2023-11-29 06:47:19,604 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 06:47:26,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3855053.3333333335, ans=0.0 2023-11-29 06:47:30,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3855053.3333333335, ans=0.0 2023-11-29 06:47:40,642 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.690e+01 9.246e+01 9.671e+01 1.044e+02 1.404e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-29 06:47:46,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3855120.0, ans=0.1 2023-11-29 06:48:18,231 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 1150, loss[loss=0.06932, simple_loss=0.08787, pruned_loss=0.01771, audio_tagging_loss=0.007671, over 15805.00 frames. ], tot_loss[loss=0.06493, simple_loss=0.08905, pruned_loss=0.01196, audio_tagging_loss=0.008448, over 3036170.18 frames. ], batch size: 60, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:48:18,339 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 578300 2023-11-29 06:48:46,084 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3855453.3333333335, ans=0.1 2023-11-29 06:49:04,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3855520.0, ans=0.125 2023-11-29 06:49:06,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=3855586.6666666665, ans=0.5 2023-11-29 06:49:19,701 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 1200, loss[loss=0.05417, simple_loss=0.07307, pruned_loss=0.009567, audio_tagging_loss=0.008064, over 15682.00 frames. ], tot_loss[loss=0.06445, simple_loss=0.08819, pruned_loss=0.01189, audio_tagging_loss=0.008471, over 3030855.96 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 06:49:19,831 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 578350 2023-11-29 06:49:31,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3855720.0, ans=0.1 2023-11-29 06:49:31,641 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.19 vs. limit=15.0 2023-11-29 06:49:33,582 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3855720.0, ans=0.05 2023-11-29 06:49:34,518 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3855720.0, ans=0.125 2023-11-29 06:49:47,194 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.561e+01 8.935e+01 9.457e+01 1.024e+02 1.157e+02, threshold=1.891e+02, percent-clipped=0.0 2023-11-29 06:49:47,509 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3855786.6666666665, ans=0.0 2023-11-29 06:49:50,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3855786.6666666665, ans=0.125 2023-11-29 06:49:52,180 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.46 vs. limit=12.0 2023-11-29 06:49:57,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3855853.3333333335, ans=0.125 2023-11-29 06:50:02,878 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.04 vs. limit=6.0 2023-11-29 06:50:15,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3855920.0, ans=0.0 2023-11-29 06:50:21,587 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 1250, loss[loss=0.06728, simple_loss=0.09381, pruned_loss=0.009874, audio_tagging_loss=0.0105, over 14682.00 frames. ], tot_loss[loss=0.06411, simple_loss=0.08794, pruned_loss=0.01165, audio_tagging_loss=0.008489, over 3023912.57 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:50:21,712 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 578400 2023-11-29 06:50:21,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3855986.6666666665, ans=0.0 2023-11-29 06:50:41,887 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=3856053.3333333335, ans=10.0 2023-11-29 06:50:41,888 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.56 vs. limit=10.0 2023-11-29 06:50:47,368 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3856120.0, ans=0.025 2023-11-29 06:50:47,886 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.20 vs. limit=15.0 2023-11-29 06:51:02,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3856186.6666666665, ans=0.125 2023-11-29 06:51:04,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3856186.6666666665, ans=0.125 2023-11-29 06:51:11,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3856253.3333333335, ans=0.035 2023-11-29 06:51:24,788 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 1300, loss[loss=0.05987, simple_loss=0.08659, pruned_loss=0.01088, audio_tagging_loss=0.005697, over 14397.00 frames. ], tot_loss[loss=0.06414, simple_loss=0.08792, pruned_loss=0.01167, audio_tagging_loss=0.008507, over 3027488.72 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:51:24,894 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 578450 2023-11-29 06:51:25,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3856320.0, ans=0.0 2023-11-29 06:51:29,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3856320.0, ans=0.95 2023-11-29 06:51:37,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3856386.6666666665, ans=0.035 2023-11-29 06:51:50,531 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.203e+01 8.934e+01 9.381e+01 1.015e+02 1.347e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-29 06:52:06,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3856520.0, ans=0.05 2023-11-29 06:52:15,399 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3856586.6666666665, ans=0.125 2023-11-29 06:52:21,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3856586.6666666665, ans=0.0 2023-11-29 06:52:22,898 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.13 vs. limit=15.0 2023-11-29 06:52:25,854 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 1350, loss[loss=0.08534, simple_loss=0.1172, pruned_loss=0.01663, audio_tagging_loss=0.0101, over 15214.00 frames. ], tot_loss[loss=0.06422, simple_loss=0.08811, pruned_loss=0.01164, audio_tagging_loss=0.008523, over 3031642.96 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:52:25,934 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 578500 2023-11-29 06:52:44,406 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3856720.0, ans=0.125 2023-11-29 06:52:51,554 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3856786.6666666665, ans=0.125 2023-11-29 06:52:56,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3856786.6666666665, ans=0.125 2023-11-29 06:52:56,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3856786.6666666665, ans=0.125 2023-11-29 06:53:01,364 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.81 vs. limit=12.0 2023-11-29 06:53:04,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3856853.3333333335, ans=0.0 2023-11-29 06:53:10,937 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 06:53:14,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3856920.0, ans=0.125 2023-11-29 06:53:18,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3856920.0, ans=0.1 2023-11-29 06:53:26,929 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 1400, loss[loss=0.083, simple_loss=0.1237, pruned_loss=0.01687, audio_tagging_loss=0.004306, over 15591.00 frames. ], tot_loss[loss=0.06393, simple_loss=0.08756, pruned_loss=0.01155, audio_tagging_loss=0.008604, over 3041328.79 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:53:27,043 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 578550 2023-11-29 06:53:31,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3856986.6666666665, ans=0.0 2023-11-29 06:53:37,846 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3857053.3333333335, ans=0.2 2023-11-29 06:53:45,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3857053.3333333335, ans=0.125 2023-11-29 06:53:54,950 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.714e+01 9.091e+01 9.742e+01 1.050e+02 1.544e+02, threshold=1.948e+02, percent-clipped=0.0 2023-11-29 06:54:05,726 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3857186.6666666665, ans=0.0 2023-11-29 06:54:29,546 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 1450, loss[loss=0.09278, simple_loss=0.1309, pruned_loss=0.0203, audio_tagging_loss=0.007001, over 15314.00 frames. ], tot_loss[loss=0.0639, simple_loss=0.08743, pruned_loss=0.01152, audio_tagging_loss=0.008661, over 3037299.33 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:54:29,648 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 578600 2023-11-29 06:55:08,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3857520.0, ans=0.1 2023-11-29 06:55:15,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3857520.0, ans=0.0 2023-11-29 06:55:29,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3857586.6666666665, ans=0.09899494936611666 2023-11-29 06:55:31,284 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 1500, loss[loss=0.06214, simple_loss=0.07809, pruned_loss=0.01178, audio_tagging_loss=0.01131, over 15222.00 frames. ], tot_loss[loss=0.06475, simple_loss=0.08878, pruned_loss=0.01166, audio_tagging_loss=0.008708, over 3035656.80 frames. ], batch size: 61, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:55:31,374 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 578650 2023-11-29 06:55:43,394 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3857720.0, ans=0.125 2023-11-29 06:55:51,028 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3857720.0, ans=0.125 2023-11-29 06:55:57,801 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.625e+01 9.098e+01 9.715e+01 1.024e+02 1.252e+02, threshold=1.943e+02, percent-clipped=0.0 2023-11-29 06:56:15,241 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3857853.3333333335, ans=0.125 2023-11-29 06:56:20,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3857920.0, ans=0.0 2023-11-29 06:56:29,149 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.36 vs. limit=22.5 2023-11-29 06:56:30,605 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.88 vs. limit=15.0 2023-11-29 06:56:32,901 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 1550, loss[loss=0.03715, simple_loss=0.033, pruned_loss=0.003884, audio_tagging_loss=0.01677, over 13946.00 frames. ], tot_loss[loss=0.06528, simple_loss=0.08934, pruned_loss=0.01181, audio_tagging_loss=0.008798, over 3034631.16 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:56:32,981 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 578700 2023-11-29 06:56:41,494 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3857986.6666666665, ans=0.125 2023-11-29 06:56:50,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3858053.3333333335, ans=0.1 2023-11-29 06:57:02,178 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.16 vs. limit=15.0 2023-11-29 06:57:30,939 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.32 vs. limit=12.0 2023-11-29 06:57:34,238 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 1600, loss[loss=0.06021, simple_loss=0.08041, pruned_loss=0.01074, audio_tagging_loss=0.00926, over 15474.00 frames. ], tot_loss[loss=0.06485, simple_loss=0.08859, pruned_loss=0.01172, audio_tagging_loss=0.00884, over 3038944.34 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 06:57:34,334 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 578750 2023-11-29 06:57:38,041 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.84 vs. limit=10.0 2023-11-29 06:57:43,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3858320.0, ans=0.0 2023-11-29 06:57:48,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3858386.6666666665, ans=0.125 2023-11-29 06:57:56,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3858386.6666666665, ans=0.0 2023-11-29 06:58:00,868 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.518e+01 9.073e+01 9.678e+01 1.045e+02 1.590e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-29 06:58:05,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3858453.3333333335, ans=0.0 2023-11-29 06:58:07,948 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.58 vs. limit=15.0 2023-11-29 06:58:12,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3858520.0, ans=0.1 2023-11-29 06:58:16,969 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3858520.0, ans=0.125 2023-11-29 06:58:18,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3858520.0, ans=0.035 2023-11-29 06:58:35,992 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 1650, loss[loss=0.0787, simple_loss=0.1025, pruned_loss=0.01934, audio_tagging_loss=0.008088, over 15278.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.08933, pruned_loss=0.01181, audio_tagging_loss=0.008851, over 3047786.98 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 06:58:36,099 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 578800 2023-11-29 06:58:42,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3858653.3333333335, ans=0.125 2023-11-29 06:58:56,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3858720.0, ans=0.0 2023-11-29 06:59:06,856 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.07 vs. limit=15.0 2023-11-29 06:59:13,600 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3858853.3333333335, ans=0.125 2023-11-29 06:59:19,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3858853.3333333335, ans=0.125 2023-11-29 06:59:19,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3858853.3333333335, ans=0.125 2023-11-29 06:59:33,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3858920.0, ans=0.125 2023-11-29 06:59:37,383 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 1700, loss[loss=0.07443, simple_loss=0.1006, pruned_loss=0.0124, audio_tagging_loss=0.01175, over 14860.00 frames. ], tot_loss[loss=0.06484, simple_loss=0.08859, pruned_loss=0.01168, audio_tagging_loss=0.008862, over 3042533.77 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:59:37,503 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 578850 2023-11-29 06:59:45,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3858986.6666666665, ans=0.015 2023-11-29 06:59:58,546 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3859053.3333333335, ans=0.125 2023-11-29 07:00:07,249 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.933e+01 9.056e+01 9.736e+01 1.037e+02 1.295e+02, threshold=1.947e+02, percent-clipped=0.0 2023-11-29 07:00:09,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3859120.0, ans=0.125 2023-11-29 07:00:39,579 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 1750, loss[loss=0.06674, simple_loss=0.09447, pruned_loss=0.01211, audio_tagging_loss=0.007397, over 15092.00 frames. ], tot_loss[loss=0.06452, simple_loss=0.08832, pruned_loss=0.01153, audio_tagging_loss=0.008823, over 3046811.47 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:00:39,714 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 578900 2023-11-29 07:01:07,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3859453.3333333335, ans=0.1 2023-11-29 07:01:09,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3859453.3333333335, ans=0.125 2023-11-29 07:01:37,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3859586.6666666665, ans=0.0 2023-11-29 07:01:42,540 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 1800, loss[loss=0.05427, simple_loss=0.07172, pruned_loss=0.01056, audio_tagging_loss=0.007848, over 15561.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.08946, pruned_loss=0.01177, audio_tagging_loss=0.008623, over 3054736.15 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:01:42,625 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 578950 2023-11-29 07:01:47,101 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3859653.3333333335, ans=0.125 2023-11-29 07:01:52,100 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.33 vs. limit=6.0 2023-11-29 07:02:11,106 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.646e+01 9.159e+01 9.748e+01 1.040e+02 1.409e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-29 07:02:11,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3859786.6666666665, ans=0.0 2023-11-29 07:02:13,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3859786.6666666665, ans=0.0 2023-11-29 07:02:30,731 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3859920.0, ans=0.125 2023-11-29 07:02:31,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3859920.0, ans=0.2 2023-11-29 07:02:44,390 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 1850, loss[loss=0.08347, simple_loss=0.1232, pruned_loss=0.01486, audio_tagging_loss=0.006993, over 15658.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.08946, pruned_loss=0.01193, audio_tagging_loss=0.008592, over 3052466.84 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:02:44,483 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 579000 2023-11-29 07:03:01,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3860053.3333333335, ans=0.0 2023-11-29 07:03:07,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3860053.3333333335, ans=0.0 2023-11-29 07:03:13,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3860120.0, ans=0.1 2023-11-29 07:03:20,165 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3860186.6666666665, ans=0.0 2023-11-29 07:03:46,132 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 1900, loss[loss=0.0717, simple_loss=0.103, pruned_loss=0.01414, audio_tagging_loss=0.006083, over 15857.00 frames. ], tot_loss[loss=0.06493, simple_loss=0.08899, pruned_loss=0.01187, audio_tagging_loss=0.008567, over 3050762.72 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:03:46,220 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 579050 2023-11-29 07:03:51,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3860320.0, ans=0.125 2023-11-29 07:04:05,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3860386.6666666665, ans=0.07 2023-11-29 07:04:14,620 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.649e+01 8.930e+01 9.376e+01 1.025e+02 1.828e+02, threshold=1.875e+02, percent-clipped=0.0 2023-11-29 07:04:38,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3860586.6666666665, ans=0.0 2023-11-29 07:04:47,591 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 1950, loss[loss=0.0764, simple_loss=0.1024, pruned_loss=0.01451, audio_tagging_loss=0.01069, over 15453.00 frames. ], tot_loss[loss=0.06476, simple_loss=0.0887, pruned_loss=0.01189, audio_tagging_loss=0.008513, over 3049634.00 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:04:47,699 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 579100 2023-11-29 07:05:00,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3860720.0, ans=0.125 2023-11-29 07:05:06,172 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3860720.0, ans=0.125 2023-11-29 07:05:11,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3860786.6666666665, ans=0.0 2023-11-29 07:05:22,017 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3860786.6666666665, ans=0.125 2023-11-29 07:05:48,980 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 2000, loss[loss=0.08659, simple_loss=0.12, pruned_loss=0.0196, audio_tagging_loss=0.006967, over 13472.00 frames. ], tot_loss[loss=0.06492, simple_loss=0.08884, pruned_loss=0.01205, audio_tagging_loss=0.008456, over 3043270.92 frames. ], batch size: 52, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 07:05:49,098 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 579150 2023-11-29 07:05:56,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3860986.6666666665, ans=0.0 2023-11-29 07:06:16,984 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.255e+01 9.275e+01 1.004e+02 1.066e+02 1.335e+02, threshold=2.008e+02, percent-clipped=0.0 2023-11-29 07:06:23,171 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3861120.0, ans=0.0 2023-11-29 07:06:31,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3861186.6666666665, ans=0.125 2023-11-29 07:06:32,981 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.89 vs. limit=15.0 2023-11-29 07:06:39,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3861253.3333333335, ans=0.09899494936611666 2023-11-29 07:06:50,253 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 2050, loss[loss=0.06767, simple_loss=0.09461, pruned_loss=0.01091, audio_tagging_loss=0.009448, over 16215.00 frames. ], tot_loss[loss=0.06481, simple_loss=0.08889, pruned_loss=0.01191, audio_tagging_loss=0.008458, over 3044577.78 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 07:06:50,332 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 579200 2023-11-29 07:07:06,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=3861386.6666666665, ans=0.05 2023-11-29 07:07:10,286 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3861386.6666666665, ans=0.125 2023-11-29 07:07:31,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3861520.0, ans=0.5 2023-11-29 07:07:40,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3861586.6666666665, ans=0.0 2023-11-29 07:07:51,818 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 2100, loss[loss=0.06912, simple_loss=0.09785, pruned_loss=0.01253, audio_tagging_loss=0.00766, over 14747.00 frames. ], tot_loss[loss=0.0649, simple_loss=0.08925, pruned_loss=0.01192, audio_tagging_loss=0.008354, over 3037862.16 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 07:07:51,901 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 579250 2023-11-29 07:08:00,524 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.14 vs. limit=22.5 2023-11-29 07:08:03,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3861720.0, ans=0.125 2023-11-29 07:08:12,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3861720.0, ans=0.0 2023-11-29 07:08:20,429 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.579e+01 9.068e+01 9.532e+01 1.017e+02 1.251e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-29 07:08:41,576 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.19 vs. limit=6.0 2023-11-29 07:08:43,382 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3861920.0, ans=0.0 2023-11-29 07:08:46,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3861920.0, ans=0.125 2023-11-29 07:08:50,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3861920.0, ans=0.125 2023-11-29 07:08:52,572 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 2150, loss[loss=0.05858, simple_loss=0.07832, pruned_loss=0.009757, audio_tagging_loss=0.009668, over 16490.00 frames. ], tot_loss[loss=0.06493, simple_loss=0.08903, pruned_loss=0.01196, audio_tagging_loss=0.008458, over 3037223.95 frames. ], batch size: 61, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 07:08:52,654 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 579300 2023-11-29 07:09:02,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3861986.6666666665, ans=0.09899494936611666 2023-11-29 07:09:25,814 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3862120.0, ans=0.1 2023-11-29 07:09:31,188 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 07:09:31,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3862186.6666666665, ans=0.1 2023-11-29 07:09:32,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3862186.6666666665, ans=0.0 2023-11-29 07:09:42,829 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.03 vs. limit=15.0 2023-11-29 07:09:51,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3862253.3333333335, ans=0.125 2023-11-29 07:09:55,051 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 2200, loss[loss=0.07117, simple_loss=0.09549, pruned_loss=0.01546, audio_tagging_loss=0.007964, over 16180.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.08987, pruned_loss=0.01217, audio_tagging_loss=0.008456, over 3042527.31 frames. ], batch size: 63, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 07:09:55,133 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 579350 2023-11-29 07:10:15,895 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3862386.6666666665, ans=0.125 2023-11-29 07:10:19,672 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3862453.3333333335, ans=0.2 2023-11-29 07:10:21,910 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 07:10:22,826 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.872e+01 9.191e+01 9.631e+01 1.029e+02 1.249e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-29 07:10:34,401 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3862520.0, ans=0.125 2023-11-29 07:10:37,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3862520.0, ans=0.05 2023-11-29 07:10:55,453 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 2250, loss[loss=0.05064, simple_loss=0.0665, pruned_loss=0.009033, audio_tagging_loss=0.008351, over 14916.00 frames. ], tot_loss[loss=0.06502, simple_loss=0.08892, pruned_loss=0.01203, audio_tagging_loss=0.008534, over 3042066.73 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:10:55,539 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 579400 2023-11-29 07:11:00,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3862653.3333333335, ans=0.125 2023-11-29 07:11:04,569 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3862653.3333333335, ans=0.125 2023-11-29 07:11:11,559 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3862720.0, ans=0.0 2023-11-29 07:11:17,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3862720.0, ans=0.0 2023-11-29 07:11:27,142 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.79 vs. limit=6.0 2023-11-29 07:11:51,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3862920.0, ans=0.015 2023-11-29 07:11:56,117 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 2300, loss[loss=0.0458, simple_loss=0.06434, pruned_loss=0.006123, audio_tagging_loss=0.007506, over 13441.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08975, pruned_loss=0.01219, audio_tagging_loss=0.008531, over 3043132.89 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:11:56,203 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 579450 2023-11-29 07:12:12,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3863053.3333333335, ans=0.125 2023-11-29 07:12:26,960 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.966e+01 9.023e+01 9.872e+01 1.066e+02 2.413e+02, threshold=1.974e+02, percent-clipped=1.0 2023-11-29 07:12:37,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3863186.6666666665, ans=0.1 2023-11-29 07:12:49,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3863253.3333333335, ans=0.125 2023-11-29 07:12:52,615 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 07:12:59,109 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 2350, loss[loss=0.08215, simple_loss=0.1108, pruned_loss=0.01749, audio_tagging_loss=0.009273, over 15429.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08936, pruned_loss=0.01216, audio_tagging_loss=0.008685, over 3038640.52 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:12:59,211 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 579500 2023-11-29 07:13:13,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3863386.6666666665, ans=0.0 2023-11-29 07:13:22,754 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3863453.3333333335, ans=0.125 2023-11-29 07:13:23,893 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3863453.3333333335, ans=0.0 2023-11-29 07:13:41,464 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.69 vs. limit=22.5 2023-11-29 07:13:49,043 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.53 vs. limit=22.5 2023-11-29 07:14:00,815 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 2400, loss[loss=0.05691, simple_loss=0.07637, pruned_loss=0.008284, audio_tagging_loss=0.01044, over 15130.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.09037, pruned_loss=0.01238, audio_tagging_loss=0.008693, over 3048684.27 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 07:14:00,897 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 579550 2023-11-29 07:14:09,257 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 07:14:16,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3863720.0, ans=0.1 2023-11-29 07:14:19,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3863720.0, ans=0.1 2023-11-29 07:14:29,032 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.05 vs. limit=6.0 2023-11-29 07:14:29,231 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.703e+01 9.216e+01 9.806e+01 1.047e+02 1.244e+02, threshold=1.961e+02, percent-clipped=0.0 2023-11-29 07:14:31,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3863786.6666666665, ans=0.125 2023-11-29 07:14:45,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3863853.3333333335, ans=0.0 2023-11-29 07:15:00,886 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 2450, loss[loss=0.08721, simple_loss=0.1141, pruned_loss=0.01953, audio_tagging_loss=0.01065, over 15978.00 frames. ], tot_loss[loss=0.066, simple_loss=0.09005, pruned_loss=0.01221, audio_tagging_loss=0.008775, over 3048515.40 frames. ], batch size: 61, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 07:15:01,006 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 579600 2023-11-29 07:15:08,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3863986.6666666665, ans=0.125 2023-11-29 07:15:14,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3864053.3333333335, ans=0.125 2023-11-29 07:15:15,377 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.77 vs. limit=15.0 2023-11-29 07:15:23,760 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 07:16:01,680 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.06 vs. limit=15.0 2023-11-29 07:16:02,336 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 2500, loss[loss=0.06973, simple_loss=0.09947, pruned_loss=0.01202, audio_tagging_loss=0.007978, over 15318.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08927, pruned_loss=0.012, audio_tagging_loss=0.008887, over 3043173.31 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 07:16:02,420 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 579650 2023-11-29 07:16:07,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3864320.0, ans=0.07 2023-11-29 07:16:26,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3864453.3333333335, ans=0.125 2023-11-29 07:16:31,836 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.736e+01 9.100e+01 9.554e+01 1.019e+02 1.302e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-29 07:16:37,342 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.93 vs. limit=15.0 2023-11-29 07:16:38,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3864520.0, ans=0.125 2023-11-29 07:16:46,807 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.26 vs. limit=15.0 2023-11-29 07:16:48,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=3864520.0, ans=15.0 2023-11-29 07:16:53,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3864586.6666666665, ans=0.125 2023-11-29 07:17:00,690 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 07:17:04,445 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 2550, loss[loss=0.07779, simple_loss=0.1001, pruned_loss=0.01871, audio_tagging_loss=0.009009, over 14548.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.089, pruned_loss=0.01212, audio_tagging_loss=0.008861, over 3048758.75 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:17:04,534 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 579700 2023-11-29 07:17:07,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3864653.3333333335, ans=0.1 2023-11-29 07:17:26,595 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3864720.0, ans=0.0 2023-11-29 07:17:32,998 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.87 vs. limit=15.0 2023-11-29 07:17:34,724 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 07:17:40,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3864853.3333333335, ans=0.1 2023-11-29 07:17:43,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3864853.3333333335, ans=0.125 2023-11-29 07:17:49,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3864853.3333333335, ans=0.2 2023-11-29 07:17:54,349 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3864920.0, ans=0.125 2023-11-29 07:18:05,300 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.33 vs. limit=6.0 2023-11-29 07:18:05,625 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 2600, loss[loss=0.04817, simple_loss=0.05943, pruned_loss=0.01052, audio_tagging_loss=0.007931, over 15079.00 frames. ], tot_loss[loss=0.0651, simple_loss=0.0889, pruned_loss=0.01194, audio_tagging_loss=0.0087, over 3049631.30 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:18:05,713 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 579750 2023-11-29 07:18:10,702 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 07:18:22,164 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.68 vs. limit=10.0 2023-11-29 07:18:26,841 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.46 vs. limit=22.5 2023-11-29 07:18:31,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3865120.0, ans=0.1 2023-11-29 07:18:36,175 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.635e+01 8.765e+01 9.414e+01 9.856e+01 2.856e+02, threshold=1.883e+02, percent-clipped=1.0 2023-11-29 07:18:47,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3865186.6666666665, ans=0.125 2023-11-29 07:18:55,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3865253.3333333335, ans=0.125 2023-11-29 07:19:02,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=3865253.3333333335, ans=0.1 2023-11-29 07:19:05,829 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 2650, loss[loss=0.0488, simple_loss=0.06976, pruned_loss=0.005471, audio_tagging_loss=0.008451, over 15651.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.0895, pruned_loss=0.012, audio_tagging_loss=0.008567, over 3041790.13 frames. ], batch size: 61, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:19:05,926 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 579800 2023-11-29 07:19:10,121 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_na.min_abs, batch_count=3865320.0, ans=0.02 2023-11-29 07:19:14,895 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 07:19:19,986 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.93 vs. limit=12.0 2023-11-29 07:19:30,606 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3865453.3333333335, ans=0.0 2023-11-29 07:19:36,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3865453.3333333335, ans=0.125 2023-11-29 07:19:48,530 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.99 vs. limit=10.0 2023-11-29 07:19:59,282 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3865586.6666666665, ans=0.125 2023-11-29 07:20:06,925 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 2700, loss[loss=0.07166, simple_loss=0.1056, pruned_loss=0.01112, audio_tagging_loss=0.007739, over 15691.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.09007, pruned_loss=0.01206, audio_tagging_loss=0.008496, over 3046974.44 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:20:07,028 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 579850 2023-11-29 07:20:34,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3865786.6666666665, ans=0.125 2023-11-29 07:20:36,848 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.953e+01 9.168e+01 9.728e+01 1.035e+02 1.379e+02, threshold=1.946e+02, percent-clipped=0.0 2023-11-29 07:20:48,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3865853.3333333335, ans=0.0 2023-11-29 07:20:55,929 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3865920.0, ans=0.1 2023-11-29 07:21:01,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3865920.0, ans=0.125 2023-11-29 07:21:01,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3865920.0, ans=0.0 2023-11-29 07:21:02,365 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3865920.0, ans=0.125 2023-11-29 07:21:06,219 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.87 vs. limit=22.5 2023-11-29 07:21:07,819 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 2750, loss[loss=0.05544, simple_loss=0.07643, pruned_loss=0.007536, audio_tagging_loss=0.009688, over 14873.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08951, pruned_loss=0.01192, audio_tagging_loss=0.008504, over 3049454.75 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:21:07,901 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 579900 2023-11-29 07:21:55,523 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3866253.3333333335, ans=0.125 2023-11-29 07:21:57,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3866253.3333333335, ans=0.025 2023-11-29 07:21:59,957 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 07:22:04,150 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.68 vs. limit=15.0 2023-11-29 07:22:08,104 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 2800, loss[loss=0.04345, simple_loss=0.06272, pruned_loss=0.004817, audio_tagging_loss=0.007271, over 14851.00 frames. ], tot_loss[loss=0.06505, simple_loss=0.08908, pruned_loss=0.01191, audio_tagging_loss=0.008595, over 3050544.21 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 07:22:08,186 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 579950 2023-11-29 07:22:15,433 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.39 vs. limit=15.0 2023-11-29 07:22:16,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3866320.0, ans=0.125 2023-11-29 07:22:20,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3866386.6666666665, ans=0.0 2023-11-29 07:22:33,593 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 07:22:37,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3866453.3333333335, ans=0.0 2023-11-29 07:22:37,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3866453.3333333335, ans=0.125 2023-11-29 07:22:39,167 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.596e+01 8.979e+01 9.442e+01 1.009e+02 1.188e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-29 07:22:53,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3866520.0, ans=0.125 2023-11-29 07:23:09,383 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 2850, loss[loss=0.07574, simple_loss=0.1034, pruned_loss=0.0142, audio_tagging_loss=0.00984, over 15703.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08948, pruned_loss=0.01196, audio_tagging_loss=0.008572, over 3040982.39 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 07:23:09,485 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 580000 2023-11-29 07:23:25,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3866720.0, ans=0.125 2023-11-29 07:23:39,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3866786.6666666665, ans=0.125 2023-11-29 07:23:50,729 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.72 vs. limit=15.0 2023-11-29 07:23:57,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3866853.3333333335, ans=0.125 2023-11-29 07:24:09,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3866920.0, ans=0.125 2023-11-29 07:24:13,555 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 2900, loss[loss=0.06895, simple_loss=0.09301, pruned_loss=0.01419, audio_tagging_loss=0.008248, over 15324.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08969, pruned_loss=0.01206, audio_tagging_loss=0.008482, over 3035390.61 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:24:13,651 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 580050 2023-11-29 07:24:26,492 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3867053.3333333335, ans=0.125 2023-11-29 07:24:44,589 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.265e+01 8.980e+01 9.788e+01 1.062e+02 1.550e+02, threshold=1.958e+02, percent-clipped=0.0 2023-11-29 07:24:53,708 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3867186.6666666665, ans=0.125 2023-11-29 07:24:59,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3867186.6666666665, ans=0.0 2023-11-29 07:24:59,444 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.55 vs. limit=15.0 2023-11-29 07:25:01,838 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.57 vs. limit=15.0 2023-11-29 07:25:14,068 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 2950, loss[loss=0.05613, simple_loss=0.08062, pruned_loss=0.00808, audio_tagging_loss=0.007738, over 15157.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.08994, pruned_loss=0.01201, audio_tagging_loss=0.008432, over 3041190.95 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:25:14,155 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 580100 2023-11-29 07:25:19,386 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.61 vs. limit=15.0 2023-11-29 07:25:21,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3867320.0, ans=0.125 2023-11-29 07:25:38,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3867453.3333333335, ans=0.1 2023-11-29 07:26:15,563 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 3000, loss[loss=0.07881, simple_loss=0.1031, pruned_loss=0.01725, audio_tagging_loss=0.01, over 14564.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08996, pruned_loss=0.01197, audio_tagging_loss=0.008512, over 3034243.08 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:26:15,564 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-29 07:26:54,604 INFO [train_asr.py:1267] (1/4) Epoch 49, validation: loss=0.05747, simple_loss=0.05054, pruned_loss=0.005474, audio_tagging_loss=0.02673, over 4681554.00 frames. 2023-11-29 07:26:54,605 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-29 07:26:54,690 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 580150 2023-11-29 07:26:57,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3867653.3333333335, ans=0.125 2023-11-29 07:27:10,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3867720.0, ans=0.125 2023-11-29 07:27:26,157 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.667e+01 9.023e+01 9.601e+01 1.027e+02 1.356e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-29 07:27:30,312 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.66 vs. limit=22.5 2023-11-29 07:27:35,399 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3867853.3333333335, ans=0.0 2023-11-29 07:27:55,397 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 3050, loss[loss=0.07499, simple_loss=0.1007, pruned_loss=0.01558, audio_tagging_loss=0.00908, over 15786.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.09086, pruned_loss=0.01224, audio_tagging_loss=0.008504, over 3032269.96 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:27:55,480 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 580200 2023-11-29 07:27:55,541 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3867986.6666666665, ans=0.0 2023-11-29 07:28:32,412 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 07:28:32,692 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3868186.6666666665, ans=0.125 2023-11-29 07:28:55,579 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3868253.3333333335, ans=0.0 2023-11-29 07:28:57,729 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 3100, loss[loss=0.05246, simple_loss=0.07304, pruned_loss=0.006473, audio_tagging_loss=0.009468, over 15513.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.08979, pruned_loss=0.01208, audio_tagging_loss=0.008651, over 3036256.61 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:28:57,893 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 580250 2023-11-29 07:29:02,404 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.01 vs. limit=15.0 2023-11-29 07:29:17,151 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3868386.6666666665, ans=0.2 2023-11-29 07:29:22,088 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.92 vs. limit=15.0 2023-11-29 07:29:29,754 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.864e+01 8.858e+01 9.570e+01 1.021e+02 1.337e+02, threshold=1.914e+02, percent-clipped=0.0 2023-11-29 07:29:56,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3868586.6666666665, ans=0.04949747468305833 2023-11-29 07:29:59,537 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 3150, loss[loss=0.05826, simple_loss=0.0809, pruned_loss=0.008176, audio_tagging_loss=0.009632, over 16163.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08984, pruned_loss=0.012, audio_tagging_loss=0.008747, over 3036868.09 frames. ], batch size: 61, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:29:59,623 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 580300 2023-11-29 07:30:06,480 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 07:30:25,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3868786.6666666665, ans=0.1 2023-11-29 07:30:26,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3868786.6666666665, ans=0.2 2023-11-29 07:30:29,273 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 07:30:41,376 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.08 vs. limit=15.0 2023-11-29 07:30:45,021 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.37 vs. limit=10.0 2023-11-29 07:30:55,522 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3868920.0, ans=0.0 2023-11-29 07:31:00,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3868986.6666666665, ans=0.125 2023-11-29 07:31:01,049 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 3200, loss[loss=0.09348, simple_loss=0.129, pruned_loss=0.02204, audio_tagging_loss=0.006936, over 15468.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.09037, pruned_loss=0.01213, audio_tagging_loss=0.00877, over 3041778.67 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 07:31:01,127 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 580350 2023-11-29 07:31:15,198 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.79 vs. limit=15.0 2023-11-29 07:31:29,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3869120.0, ans=0.125 2023-11-29 07:31:33,096 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.044e+01 8.935e+01 9.459e+01 1.020e+02 1.289e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-29 07:31:36,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3869186.6666666665, ans=0.0 2023-11-29 07:32:02,190 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 3250, loss[loss=0.09139, simple_loss=0.1262, pruned_loss=0.02031, audio_tagging_loss=0.007958, over 15021.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.09038, pruned_loss=0.01211, audio_tagging_loss=0.008839, over 3040119.64 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:32:02,275 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 580400 2023-11-29 07:32:03,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3869320.0, ans=0.125 2023-11-29 07:32:13,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3869320.0, ans=0.2 2023-11-29 07:32:26,529 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.70 vs. limit=15.0 2023-11-29 07:32:33,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3869453.3333333335, ans=0.125 2023-11-29 07:32:36,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3869453.3333333335, ans=0.0 2023-11-29 07:32:41,786 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.56 vs. limit=22.5 2023-11-29 07:32:41,838 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.59 vs. limit=10.0 2023-11-29 07:32:47,882 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.34 vs. limit=22.5 2023-11-29 07:32:49,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=3869520.0, ans=0.05 2023-11-29 07:32:54,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3869586.6666666665, ans=0.035 2023-11-29 07:33:00,570 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.24 vs. limit=10.0 2023-11-29 07:33:04,500 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 3300, loss[loss=0.06236, simple_loss=0.07876, pruned_loss=0.01476, audio_tagging_loss=0.008209, over 14946.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.08959, pruned_loss=0.01202, audio_tagging_loss=0.008928, over 3042577.89 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:33:04,588 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 580450 2023-11-29 07:33:05,830 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=3869653.3333333335, ans=0.5 2023-11-29 07:33:13,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3869653.3333333335, ans=0.125 2023-11-29 07:33:30,752 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3869786.6666666665, ans=0.0 2023-11-29 07:33:37,737 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.887e+01 8.902e+01 9.466e+01 1.005e+02 1.164e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-29 07:33:40,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3869853.3333333335, ans=0.125 2023-11-29 07:33:53,697 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.93 vs. limit=6.0 2023-11-29 07:34:04,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3869920.0, ans=0.125 2023-11-29 07:34:06,875 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 3350, loss[loss=0.05507, simple_loss=0.07399, pruned_loss=0.006592, audio_tagging_loss=0.01148, over 14849.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.09, pruned_loss=0.01191, audio_tagging_loss=0.008857, over 3043692.04 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:34:06,969 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 580500 2023-11-29 07:34:19,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3870053.3333333335, ans=0.1 2023-11-29 07:35:03,369 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-29 07:35:08,988 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 3400, loss[loss=0.07267, simple_loss=0.1037, pruned_loss=0.01124, audio_tagging_loss=0.009594, over 14325.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.08968, pruned_loss=0.0118, audio_tagging_loss=0.008778, over 3043416.12 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:35:09,081 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 580550 2023-11-29 07:35:12,746 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3870320.0, ans=0.0 2023-11-29 07:35:12,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3870320.0, ans=0.125 2023-11-29 07:35:13,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=3870320.0, ans=15.0 2023-11-29 07:35:32,752 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3870453.3333333335, ans=0.125 2023-11-29 07:35:33,993 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3870453.3333333335, ans=0.0 2023-11-29 07:35:39,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3870453.3333333335, ans=0.125 2023-11-29 07:35:41,963 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.530e+01 9.012e+01 9.460e+01 1.056e+02 1.309e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-29 07:35:46,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3870520.0, ans=0.1 2023-11-29 07:36:11,824 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 3450, loss[loss=0.06763, simple_loss=0.08662, pruned_loss=0.01347, audio_tagging_loss=0.01085, over 15099.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.08986, pruned_loss=0.01187, audio_tagging_loss=0.008644, over 3042336.29 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:36:11,958 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 580600 2023-11-29 07:36:12,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3870653.3333333335, ans=0.125 2023-11-29 07:36:15,906 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3870653.3333333335, ans=0.125 2023-11-29 07:36:29,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3870720.0, ans=0.0 2023-11-29 07:37:13,501 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 3500, loss[loss=0.08588, simple_loss=0.1226, pruned_loss=0.01744, audio_tagging_loss=0.00716, over 15652.00 frames. ], tot_loss[loss=0.06495, simple_loss=0.08939, pruned_loss=0.0117, audio_tagging_loss=0.008548, over 3038141.82 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:37:13,593 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 580650 2023-11-29 07:37:28,837 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3871053.3333333335, ans=0.07 2023-11-29 07:37:45,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3871120.0, ans=0.0 2023-11-29 07:37:47,422 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 07:37:48,510 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.442e+01 8.986e+01 9.811e+01 1.065e+02 1.473e+02, threshold=1.962e+02, percent-clipped=0.0 2023-11-29 07:37:57,072 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3871186.6666666665, ans=0.0 2023-11-29 07:38:05,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3871253.3333333335, ans=10.0 2023-11-29 07:38:07,314 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.83 vs. limit=6.0 2023-11-29 07:38:16,960 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 3550, loss[loss=0.06758, simple_loss=0.08217, pruned_loss=0.01357, audio_tagging_loss=0.01292, over 14884.00 frames. ], tot_loss[loss=0.06501, simple_loss=0.08944, pruned_loss=0.01173, audio_tagging_loss=0.008562, over 3034666.49 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:38:17,081 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 580700 2023-11-29 07:38:32,522 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3871386.6666666665, ans=0.125 2023-11-29 07:38:33,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3871386.6666666665, ans=0.0 2023-11-29 07:38:33,787 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3871386.6666666665, ans=0.1 2023-11-29 07:38:39,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3871386.6666666665, ans=0.125 2023-11-29 07:38:41,993 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3871453.3333333335, ans=0.0 2023-11-29 07:38:42,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3871453.3333333335, ans=0.125 2023-11-29 07:39:18,695 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 3600, loss[loss=0.07284, simple_loss=0.1028, pruned_loss=0.01391, audio_tagging_loss=0.007536, over 14921.00 frames. ], tot_loss[loss=0.06454, simple_loss=0.08872, pruned_loss=0.01163, audio_tagging_loss=0.00855, over 3034796.94 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 07:39:18,800 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 580750 2023-11-29 07:39:19,290 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.52 vs. limit=15.0 2023-11-29 07:39:28,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3871653.3333333335, ans=0.125 2023-11-29 07:39:33,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3871720.0, ans=0.125 2023-11-29 07:39:51,849 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.229e+01 8.727e+01 9.343e+01 1.017e+02 1.458e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-29 07:39:53,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3871786.6666666665, ans=0.0 2023-11-29 07:40:17,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3871920.0, ans=0.0 2023-11-29 07:40:19,967 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 3650, loss[loss=0.06952, simple_loss=0.09314, pruned_loss=0.01265, audio_tagging_loss=0.0103, over 15578.00 frames. ], tot_loss[loss=0.06445, simple_loss=0.08852, pruned_loss=0.01168, audio_tagging_loss=0.008514, over 3035060.90 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 07:40:20,084 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 580800 2023-11-29 07:41:05,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3872186.6666666665, ans=0.125 2023-11-29 07:41:06,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3872186.6666666665, ans=0.0 2023-11-29 07:41:21,639 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 3700, loss[loss=0.05865, simple_loss=0.07555, pruned_loss=0.01433, audio_tagging_loss=0.006544, over 14611.00 frames. ], tot_loss[loss=0.06448, simple_loss=0.08848, pruned_loss=0.01172, audio_tagging_loss=0.008525, over 3045656.69 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:41:21,759 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 580850 2023-11-29 07:41:35,901 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3872386.6666666665, ans=0.125 2023-11-29 07:41:56,225 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.678e+01 9.236e+01 9.960e+01 1.067e+02 1.392e+02, threshold=1.992e+02, percent-clipped=0.0 2023-11-29 07:42:24,375 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 3750, loss[loss=0.07188, simple_loss=0.1002, pruned_loss=0.0132, audio_tagging_loss=0.008587, over 15567.00 frames. ], tot_loss[loss=0.06431, simple_loss=0.08799, pruned_loss=0.01171, audio_tagging_loss=0.008596, over 3045269.59 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:42:24,490 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 580900 2023-11-29 07:43:09,384 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 07:43:10,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3872853.3333333335, ans=0.125 2023-11-29 07:43:22,278 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.28 vs. limit=10.0 2023-11-29 07:43:25,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_na.min_abs, batch_count=3872986.6666666665, ans=0.02 2023-11-29 07:43:26,305 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 3800, loss[loss=0.08719, simple_loss=0.1312, pruned_loss=0.01663, audio_tagging_loss=0.004941, over 15119.00 frames. ], tot_loss[loss=0.06456, simple_loss=0.08829, pruned_loss=0.01181, audio_tagging_loss=0.008599, over 3037755.26 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:43:26,437 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 580950 2023-11-29 07:43:39,676 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3873053.3333333335, ans=0.0 2023-11-29 07:43:41,211 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.15 vs. limit=22.5 2023-11-29 07:43:49,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3873120.0, ans=0.0 2023-11-29 07:44:01,549 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.516e+01 8.972e+01 9.513e+01 1.036e+02 1.364e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-29 07:44:10,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3873186.6666666665, ans=0.0 2023-11-29 07:44:28,023 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 3850, loss[loss=0.06117, simple_loss=0.08918, pruned_loss=0.009096, audio_tagging_loss=0.007488, over 15732.00 frames. ], tot_loss[loss=0.06487, simple_loss=0.08874, pruned_loss=0.01177, audio_tagging_loss=0.008735, over 3032452.37 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:44:28,130 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 581000 2023-11-29 07:44:30,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3873320.0, ans=0.1 2023-11-29 07:44:44,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3873386.6666666665, ans=0.0 2023-11-29 07:45:19,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3873586.6666666665, ans=0.0 2023-11-29 07:45:30,904 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 3900, loss[loss=0.07522, simple_loss=0.1047, pruned_loss=0.01503, audio_tagging_loss=0.007833, over 14780.00 frames. ], tot_loss[loss=0.06507, simple_loss=0.08915, pruned_loss=0.01177, audio_tagging_loss=0.008718, over 3029788.18 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:45:30,993 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 581050 2023-11-29 07:45:39,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3873653.3333333335, ans=0.125 2023-11-29 07:45:47,494 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.09 vs. limit=12.0 2023-11-29 07:46:04,758 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.538e+01 8.794e+01 9.412e+01 1.023e+02 1.323e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-29 07:46:31,879 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 3950, loss[loss=0.07265, simple_loss=0.09856, pruned_loss=0.01439, audio_tagging_loss=0.008978, over 15246.00 frames. ], tot_loss[loss=0.06497, simple_loss=0.08882, pruned_loss=0.01176, audio_tagging_loss=0.008796, over 3029862.27 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:46:31,961 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 581100 2023-11-29 07:46:46,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3874053.3333333335, ans=0.0 2023-11-29 07:46:47,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=3874053.3333333335, ans=0.5 2023-11-29 07:47:00,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3874120.0, ans=0.125 2023-11-29 07:47:06,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3874120.0, ans=0.2 2023-11-29 07:47:11,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3874186.6666666665, ans=0.125 2023-11-29 07:47:30,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3874253.3333333335, ans=0.125 2023-11-29 07:47:32,142 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 4000, loss[loss=0.07161, simple_loss=0.1067, pruned_loss=0.01278, audio_tagging_loss=0.005459, over 15832.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.09037, pruned_loss=0.01203, audio_tagging_loss=0.00877, over 3039945.67 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 07:47:32,248 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 581150 2023-11-29 07:48:08,326 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.251e+01 9.122e+01 9.589e+01 1.060e+02 2.170e+02, threshold=1.918e+02, percent-clipped=1.0 2023-11-29 07:48:14,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3874520.0, ans=0.1 2023-11-29 07:48:33,332 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 4050, loss[loss=0.07634, simple_loss=0.1141, pruned_loss=0.01293, audio_tagging_loss=0.006367, over 15370.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.09026, pruned_loss=0.01198, audio_tagging_loss=0.008728, over 3038302.09 frames. ], batch size: 60, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:48:33,423 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 581200 2023-11-29 07:48:37,552 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 07:48:50,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3874720.0, ans=0.125 2023-11-29 07:49:00,831 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.73 vs. limit=15.0 2023-11-29 07:49:22,887 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3874920.0, ans=0.125 2023-11-29 07:49:32,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3874920.0, ans=0.125 2023-11-29 07:49:35,744 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 4100, loss[loss=0.05112, simple_loss=0.06542, pruned_loss=0.008312, audio_tagging_loss=0.0101, over 15629.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.09064, pruned_loss=0.01198, audio_tagging_loss=0.008677, over 3030313.87 frames. ], batch size: 60, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:49:35,829 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 581250 2023-11-29 07:49:47,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3875053.3333333335, ans=0.125 2023-11-29 07:50:00,942 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.88 vs. limit=15.0 2023-11-29 07:50:11,080 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.045e+01 9.112e+01 9.700e+01 1.031e+02 1.226e+02, threshold=1.940e+02, percent-clipped=0.0 2023-11-29 07:50:16,518 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3875186.6666666665, ans=0.025 2023-11-29 07:50:23,436 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.48 vs. limit=12.0 2023-11-29 07:50:25,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3875253.3333333335, ans=0.0 2023-11-29 07:50:27,694 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3875253.3333333335, ans=0.0 2023-11-29 07:50:36,468 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 4150, loss[loss=0.07067, simple_loss=0.09104, pruned_loss=0.01531, audio_tagging_loss=0.009843, over 16089.00 frames. ], tot_loss[loss=0.06602, simple_loss=0.09071, pruned_loss=0.01204, audio_tagging_loss=0.008619, over 3036664.42 frames. ], batch size: 62, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:50:36,572 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 581300 2023-11-29 07:50:42,518 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3875320.0, ans=0.125 2023-11-29 07:50:50,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3875386.6666666665, ans=0.125 2023-11-29 07:50:52,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3875386.6666666665, ans=0.125 2023-11-29 07:51:16,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3875520.0, ans=0.0 2023-11-29 07:51:22,066 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 07:51:33,587 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.60 vs. limit=15.0 2023-11-29 07:51:37,800 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 4200, loss[loss=0.04325, simple_loss=0.05958, pruned_loss=0.004922, audio_tagging_loss=0.008537, over 15677.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.0901, pruned_loss=0.01204, audio_tagging_loss=0.008568, over 3038842.09 frames. ], batch size: 60, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:51:37,912 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 581350 2023-11-29 07:51:50,788 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3875720.0, ans=0.05 2023-11-29 07:51:50,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3875720.0, ans=0.2 2023-11-29 07:52:13,210 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.639e+01 9.081e+01 9.650e+01 1.017e+02 1.202e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-29 07:52:21,237 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3875853.3333333335, ans=0.0 2023-11-29 07:52:22,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3875853.3333333335, ans=0.125 2023-11-29 07:52:39,540 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 4250, loss[loss=0.08321, simple_loss=0.12, pruned_loss=0.01671, audio_tagging_loss=0.006502, over 15630.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.09001, pruned_loss=0.01202, audio_tagging_loss=0.008512, over 3045661.36 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:52:39,624 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 581400 2023-11-29 07:52:40,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3875986.6666666665, ans=0.125 2023-11-29 07:52:41,401 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.29 vs. limit=6.0 2023-11-29 07:53:11,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3876120.0, ans=0.1 2023-11-29 07:53:13,450 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.67 vs. limit=15.0 2023-11-29 07:53:41,508 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 4300, loss[loss=0.07997, simple_loss=0.1127, pruned_loss=0.01618, audio_tagging_loss=0.007422, over 15356.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.0899, pruned_loss=0.01192, audio_tagging_loss=0.008394, over 3049414.58 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:53:41,626 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 581450 2023-11-29 07:53:54,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3876386.6666666665, ans=0.1 2023-11-29 07:54:12,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3876453.3333333335, ans=0.1 2023-11-29 07:54:16,690 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.403e+01 9.277e+01 9.932e+01 1.054e+02 1.240e+02, threshold=1.986e+02, percent-clipped=0.0 2023-11-29 07:54:27,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3876520.0, ans=0.04949747468305833 2023-11-29 07:54:30,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3876586.6666666665, ans=0.125 2023-11-29 07:54:30,938 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.00 vs. limit=12.0 2023-11-29 07:54:42,952 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 4350, loss[loss=0.05831, simple_loss=0.08062, pruned_loss=0.0109, audio_tagging_loss=0.007099, over 15333.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.08954, pruned_loss=0.01197, audio_tagging_loss=0.008371, over 3038903.08 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:54:43,065 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 581500 2023-11-29 07:54:56,036 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3876720.0, ans=0.0 2023-11-29 07:54:59,918 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.39 vs. limit=22.5 2023-11-29 07:55:07,924 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3876786.6666666665, ans=0.125 2023-11-29 07:55:21,535 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.05 vs. limit=6.0 2023-11-29 07:55:37,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3876920.0, ans=10.0 2023-11-29 07:55:38,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3876920.0, ans=0.125 2023-11-29 07:55:44,997 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 4400, loss[loss=0.0462, simple_loss=0.05799, pruned_loss=0.006379, audio_tagging_loss=0.01083, over 14865.00 frames. ], tot_loss[loss=0.06488, simple_loss=0.08902, pruned_loss=0.01187, audio_tagging_loss=0.008493, over 3037572.59 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 07:55:45,090 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 581550 2023-11-29 07:55:53,316 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3876986.6666666665, ans=0.95 2023-11-29 07:56:10,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3877120.0, ans=0.0 2023-11-29 07:56:11,424 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.88 vs. limit=15.0 2023-11-29 07:56:17,378 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.00 vs. limit=15.0 2023-11-29 07:56:19,574 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.03 vs. limit=15.0 2023-11-29 07:56:20,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3877186.6666666665, ans=0.0 2023-11-29 07:56:21,165 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.065e+01 9.242e+01 9.842e+01 1.066e+02 1.310e+02, threshold=1.968e+02, percent-clipped=0.0 2023-11-29 07:56:32,080 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3877186.6666666665, ans=0.09899494936611666 2023-11-29 07:56:33,212 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3877253.3333333335, ans=0.125 2023-11-29 07:56:46,475 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 4450, loss[loss=0.06431, simple_loss=0.08854, pruned_loss=0.01259, audio_tagging_loss=0.007443, over 15676.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.08965, pruned_loss=0.01188, audio_tagging_loss=0.008376, over 3039747.97 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:56:46,597 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 581600 2023-11-29 07:56:56,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3877320.0, ans=0.125 2023-11-29 07:57:11,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3877453.3333333335, ans=0.0 2023-11-29 07:57:48,367 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 4500, loss[loss=0.07455, simple_loss=0.1033, pruned_loss=0.01447, audio_tagging_loss=0.008436, over 15332.00 frames. ], tot_loss[loss=0.06467, simple_loss=0.08913, pruned_loss=0.01173, audio_tagging_loss=0.008372, over 3043951.16 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:57:48,454 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 581650 2023-11-29 07:57:48,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3877653.3333333335, ans=0.125 2023-11-29 07:57:53,814 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.24 vs. limit=15.0 2023-11-29 07:58:12,058 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 07:58:25,195 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.600e+01 9.167e+01 9.852e+01 1.040e+02 1.276e+02, threshold=1.970e+02, percent-clipped=0.0 2023-11-29 07:58:27,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3877853.3333333335, ans=0.125 2023-11-29 07:58:29,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3877853.3333333335, ans=0.1 2023-11-29 07:58:50,538 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 4550, loss[loss=0.06267, simple_loss=0.08714, pruned_loss=0.01041, audio_tagging_loss=0.008699, over 15654.00 frames. ], tot_loss[loss=0.06495, simple_loss=0.08951, pruned_loss=0.0118, audio_tagging_loss=0.008396, over 3038865.98 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:58:50,624 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 581700 2023-11-29 07:59:00,246 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3877986.6666666665, ans=0.125 2023-11-29 07:59:01,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3878053.3333333335, ans=0.0 2023-11-29 07:59:02,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3878053.3333333335, ans=0.0 2023-11-29 07:59:02,376 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3878053.3333333335, ans=0.0 2023-11-29 07:59:17,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3878120.0, ans=0.1 2023-11-29 07:59:29,654 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3878186.6666666665, ans=0.1 2023-11-29 07:59:37,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3878186.6666666665, ans=0.0 2023-11-29 07:59:38,689 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 07:59:40,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3878253.3333333335, ans=0.0 2023-11-29 07:59:41,546 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3878253.3333333335, ans=0.2 2023-11-29 07:59:49,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3878253.3333333335, ans=0.0 2023-11-29 07:59:49,598 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3878253.3333333335, ans=0.125 2023-11-29 07:59:51,491 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 4600, loss[loss=0.0518, simple_loss=0.07496, pruned_loss=0.005685, audio_tagging_loss=0.008637, over 14669.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08965, pruned_loss=0.01187, audio_tagging_loss=0.00844, over 3043504.63 frames. ], batch size: 53, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:59:51,585 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 581750 2023-11-29 08:00:07,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3878386.6666666665, ans=0.0 2023-11-29 08:00:09,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3878386.6666666665, ans=0.1 2023-11-29 08:00:29,074 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.065e+01 8.974e+01 9.623e+01 1.050e+02 1.439e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-29 08:00:47,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3878586.6666666665, ans=0.07 2023-11-29 08:00:53,005 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 4650, loss[loss=0.06275, simple_loss=0.08549, pruned_loss=0.01201, audio_tagging_loss=0.008, over 14115.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.09017, pruned_loss=0.01196, audio_tagging_loss=0.00851, over 3046091.85 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:00:53,759 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 581800 2023-11-29 08:00:54,136 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.50 vs. limit=15.0 2023-11-29 08:01:01,581 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.86 vs. limit=15.0 2023-11-29 08:01:47,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3878920.0, ans=0.1 2023-11-29 08:01:48,916 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 08:01:56,863 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 4700, loss[loss=0.0627, simple_loss=0.08463, pruned_loss=0.0104, audio_tagging_loss=0.009977, over 15929.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08978, pruned_loss=0.01178, audio_tagging_loss=0.008567, over 3049875.12 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:01:56,987 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 581850 2023-11-29 08:02:05,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3878986.6666666665, ans=0.1 2023-11-29 08:02:12,004 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.00 vs. limit=15.0 2023-11-29 08:02:15,289 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-29 08:02:33,810 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.525e+01 9.091e+01 9.646e+01 1.031e+02 1.253e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-29 08:02:35,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3879186.6666666665, ans=0.0 2023-11-29 08:02:39,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3879186.6666666665, ans=0.0 2023-11-29 08:02:58,721 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 4750, loss[loss=0.06599, simple_loss=0.09153, pruned_loss=0.01028, audio_tagging_loss=0.009943, over 14515.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.08964, pruned_loss=0.01179, audio_tagging_loss=0.008646, over 3050576.29 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:02:58,852 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 581900 2023-11-29 08:03:09,957 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.89 vs. limit=8.0 2023-11-29 08:03:34,131 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.80 vs. limit=15.0 2023-11-29 08:03:34,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3879520.0, ans=0.125 2023-11-29 08:03:39,488 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3879520.0, ans=0.0 2023-11-29 08:03:44,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3879520.0, ans=0.125 2023-11-29 08:03:51,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3879586.6666666665, ans=0.125 2023-11-29 08:03:59,277 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 4800, loss[loss=0.05651, simple_loss=0.07378, pruned_loss=0.01045, audio_tagging_loss=0.009168, over 15095.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.08927, pruned_loss=0.01177, audio_tagging_loss=0.008721, over 3049890.17 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 08:03:59,361 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 581950 2023-11-29 08:04:10,926 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 08:04:12,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3879720.0, ans=0.125 2023-11-29 08:04:19,418 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.96 vs. limit=15.0 2023-11-29 08:04:20,741 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.15 vs. limit=15.0 2023-11-29 08:04:36,440 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.055e+01 9.178e+01 9.692e+01 1.041e+02 1.280e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-29 08:04:36,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3879853.3333333335, ans=0.125 2023-11-29 08:04:38,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3879853.3333333335, ans=0.125 2023-11-29 08:05:01,405 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 4850, loss[loss=0.0758, simple_loss=0.1007, pruned_loss=0.01442, audio_tagging_loss=0.01104, over 14947.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08973, pruned_loss=0.01182, audio_tagging_loss=0.008785, over 3047815.75 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:05:01,589 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 582000 2023-11-29 08:05:12,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=3879986.6666666665, ans=0.05 2023-11-29 08:05:19,789 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3880053.3333333335, ans=0.1 2023-11-29 08:05:25,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3880120.0, ans=0.2 2023-11-29 08:05:35,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3880120.0, ans=0.0 2023-11-29 08:05:40,678 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.76 vs. limit=15.0 2023-11-29 08:05:56,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3880253.3333333335, ans=0.2 2023-11-29 08:06:04,479 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 4900, loss[loss=0.06439, simple_loss=0.08525, pruned_loss=0.01216, audio_tagging_loss=0.009604, over 16142.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08982, pruned_loss=0.01178, audio_tagging_loss=0.008672, over 3049397.21 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:06:04,562 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 582050 2023-11-29 08:06:24,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3880386.6666666665, ans=0.125 2023-11-29 08:06:43,225 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.902e+01 9.348e+01 9.931e+01 1.050e+02 1.310e+02, threshold=1.986e+02, percent-clipped=0.0 2023-11-29 08:06:49,819 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.65 vs. limit=12.0 2023-11-29 08:07:02,361 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.43 vs. limit=6.0 2023-11-29 08:07:05,355 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 4950, loss[loss=0.06596, simple_loss=0.09389, pruned_loss=0.006923, audio_tagging_loss=0.01209, over 14753.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.09051, pruned_loss=0.01182, audio_tagging_loss=0.008509, over 3052215.09 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:07:05,440 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 582100 2023-11-29 08:07:12,747 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.73 vs. limit=15.0 2023-11-29 08:07:59,262 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.04 vs. limit=15.0 2023-11-29 08:08:01,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3880920.0, ans=10.0 2023-11-29 08:08:07,557 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 5000, loss[loss=0.05637, simple_loss=0.07734, pruned_loss=0.01051, audio_tagging_loss=0.007196, over 15330.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.09031, pruned_loss=0.01187, audio_tagging_loss=0.008434, over 3056111.37 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:08:07,694 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 582150 2023-11-29 08:08:10,550 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.67 vs. limit=15.0 2023-11-29 08:08:11,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3880986.6666666665, ans=0.025 2023-11-29 08:08:27,742 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3881053.3333333335, ans=0.0 2023-11-29 08:08:45,875 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.750e+01 9.159e+01 9.676e+01 1.038e+02 1.226e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-29 08:09:09,507 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-29 08:09:10,345 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 5050, loss[loss=0.06113, simple_loss=0.09091, pruned_loss=0.008759, audio_tagging_loss=0.00692, over 16223.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.09087, pruned_loss=0.012, audio_tagging_loss=0.008279, over 3049049.86 frames. ], batch size: 60, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:09:10,418 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 582200 2023-11-29 08:09:10,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3881320.0, ans=0.125 2023-11-29 08:09:17,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3881320.0, ans=0.125 2023-11-29 08:09:22,989 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.02 vs. limit=22.5 2023-11-29 08:09:25,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3881386.6666666665, ans=0.1 2023-11-29 08:10:03,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3881586.6666666665, ans=0.125 2023-11-29 08:10:11,790 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 5100, loss[loss=0.07137, simple_loss=0.09676, pruned_loss=0.01379, audio_tagging_loss=0.009204, over 16047.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.09073, pruned_loss=0.01208, audio_tagging_loss=0.008327, over 3049804.60 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:10:11,879 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 582250 2023-11-29 08:10:33,658 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.65 vs. limit=15.0 2023-11-29 08:10:49,741 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.896e+01 8.838e+01 9.435e+01 1.031e+02 1.429e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-29 08:10:50,335 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.34 vs. limit=10.0 2023-11-29 08:11:13,116 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 5150, loss[loss=0.04226, simple_loss=0.06181, pruned_loss=0.005057, audio_tagging_loss=0.0063, over 15520.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08992, pruned_loss=0.01194, audio_tagging_loss=0.008408, over 3049244.20 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:11:13,218 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 582300 2023-11-29 08:11:44,765 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3882120.0, ans=0.2 2023-11-29 08:11:58,580 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.96 vs. limit=15.0 2023-11-29 08:12:03,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3882253.3333333335, ans=0.0 2023-11-29 08:12:05,228 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3882253.3333333335, ans=0.1 2023-11-29 08:12:15,501 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 5200, loss[loss=0.06719, simple_loss=0.1051, pruned_loss=0.009202, audio_tagging_loss=0.005411, over 14583.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.09006, pruned_loss=0.01195, audio_tagging_loss=0.008376, over 3045415.45 frames. ], batch size: 52, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 08:12:15,604 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 582350 2023-11-29 08:12:23,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3882320.0, ans=0.125 2023-11-29 08:12:24,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=3882320.0, ans=10.0 2023-11-29 08:12:24,864 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3882320.0, ans=0.09899494936611666 2023-11-29 08:12:42,114 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3882453.3333333335, ans=0.125 2023-11-29 08:12:45,829 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.36 vs. limit=15.0 2023-11-29 08:12:49,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3882453.3333333335, ans=0.125 2023-11-29 08:12:52,837 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.957e+01 8.958e+01 9.640e+01 1.041e+02 1.476e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-29 08:13:04,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3882586.6666666665, ans=0.125 2023-11-29 08:13:12,394 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.60 vs. limit=6.0 2023-11-29 08:13:16,406 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 5250, loss[loss=0.0808, simple_loss=0.1082, pruned_loss=0.01766, audio_tagging_loss=0.009071, over 15648.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.09033, pruned_loss=0.01199, audio_tagging_loss=0.008326, over 3048442.12 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 08:13:16,494 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 582400 2023-11-29 08:13:24,072 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3882653.3333333335, ans=0.2 2023-11-29 08:13:30,777 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3882720.0, ans=0.0 2023-11-29 08:13:34,200 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 08:13:34,235 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3882720.0, ans=0.1 2023-11-29 08:13:58,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3882853.3333333335, ans=0.125 2023-11-29 08:14:18,871 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 5300, loss[loss=0.07943, simple_loss=0.1062, pruned_loss=0.01548, audio_tagging_loss=0.01085, over 15804.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08975, pruned_loss=0.01199, audio_tagging_loss=0.008315, over 3055741.01 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 08:14:18,972 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 582450 2023-11-29 08:14:21,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3882986.6666666665, ans=0.125 2023-11-29 08:14:27,509 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.whiten.whitening_limit, batch_count=3882986.6666666665, ans=12.0 2023-11-29 08:14:31,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3883053.3333333335, ans=0.125 2023-11-29 08:14:32,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=3883053.3333333335, ans=0.025 2023-11-29 08:14:46,660 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.79 vs. limit=22.5 2023-11-29 08:14:48,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3883120.0, ans=0.125 2023-11-29 08:14:57,523 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.980e+01 9.053e+01 9.676e+01 1.034e+02 1.415e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-29 08:15:07,784 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3883253.3333333335, ans=0.0 2023-11-29 08:15:08,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3883253.3333333335, ans=0.05 2023-11-29 08:15:13,111 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.27 vs. limit=10.0 2023-11-29 08:15:14,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3883253.3333333335, ans=0.0 2023-11-29 08:15:20,460 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 5350, loss[loss=0.0597, simple_loss=0.07895, pruned_loss=0.01102, audio_tagging_loss=0.0092, over 15155.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.09032, pruned_loss=0.01207, audio_tagging_loss=0.008358, over 3053087.53 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:15:20,573 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 582500 2023-11-29 08:15:20,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3883320.0, ans=0.2 2023-11-29 08:15:53,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3883453.3333333335, ans=0.125 2023-11-29 08:15:56,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3883520.0, ans=0.125 2023-11-29 08:15:58,071 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.72 vs. limit=15.0 2023-11-29 08:16:06,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3883520.0, ans=0.125 2023-11-29 08:16:21,986 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 5400, loss[loss=0.07224, simple_loss=0.09043, pruned_loss=0.01817, audio_tagging_loss=0.00885, over 14417.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08982, pruned_loss=0.01199, audio_tagging_loss=0.008493, over 3051630.56 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:16:22,101 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 582550 2023-11-29 08:16:30,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3883653.3333333335, ans=0.0 2023-11-29 08:16:47,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3883786.6666666665, ans=0.2 2023-11-29 08:17:01,379 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.077e+01 9.215e+01 9.741e+01 1.047e+02 1.328e+02, threshold=1.948e+02, percent-clipped=0.0 2023-11-29 08:17:21,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3883920.0, ans=0.0 2023-11-29 08:17:23,118 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 5450, loss[loss=0.05057, simple_loss=0.07605, pruned_loss=0.005323, audio_tagging_loss=0.007221, over 16091.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.09028, pruned_loss=0.01196, audio_tagging_loss=0.008493, over 3050878.83 frames. ], batch size: 60, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:17:23,250 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 582600 2023-11-29 08:17:24,923 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.78 vs. limit=6.0 2023-11-29 08:17:42,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3884053.3333333335, ans=0.1 2023-11-29 08:17:43,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3884053.3333333335, ans=0.125 2023-11-29 08:17:54,688 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3884120.0, ans=0.125 2023-11-29 08:18:12,927 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3884253.3333333335, ans=0.0 2023-11-29 08:18:24,686 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 5500, loss[loss=0.06414, simple_loss=0.08385, pruned_loss=0.009062, audio_tagging_loss=0.01315, over 15579.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08975, pruned_loss=0.01189, audio_tagging_loss=0.008591, over 3047534.37 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:18:24,798 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 582650 2023-11-29 08:18:44,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3884386.6666666665, ans=0.0 2023-11-29 08:18:53,218 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3884453.3333333335, ans=10.0 2023-11-29 08:19:03,452 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.550e+01 9.075e+01 9.683e+01 1.060e+02 1.497e+02, threshold=1.937e+02, percent-clipped=0.0 2023-11-29 08:19:14,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3884586.6666666665, ans=0.0 2023-11-29 08:19:18,189 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3884586.6666666665, ans=0.0 2023-11-29 08:19:18,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3884586.6666666665, ans=0.1 2023-11-29 08:19:21,658 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3884586.6666666665, ans=0.1 2023-11-29 08:19:25,591 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 5550, loss[loss=0.05168, simple_loss=0.0677, pruned_loss=0.008177, audio_tagging_loss=0.009652, over 14734.00 frames. ], tot_loss[loss=0.06538, simple_loss=0.08954, pruned_loss=0.01193, audio_tagging_loss=0.008685, over 3043435.16 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:19:25,692 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 582700 2023-11-29 08:19:41,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3884720.0, ans=0.125 2023-11-29 08:19:41,921 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.11 vs. limit=15.0 2023-11-29 08:20:09,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3884853.3333333335, ans=0.125 2023-11-29 08:20:13,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3884920.0, ans=0.125 2023-11-29 08:20:21,488 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3884920.0, ans=0.2 2023-11-29 08:20:26,547 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 5600, loss[loss=0.06031, simple_loss=0.08137, pruned_loss=0.01315, audio_tagging_loss=0.006475, over 15123.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08968, pruned_loss=0.01178, audio_tagging_loss=0.008729, over 3043644.84 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 08:20:26,656 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 582750 2023-11-29 08:21:06,732 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.841e+01 9.122e+01 9.786e+01 1.074e+02 1.432e+02, threshold=1.957e+02, percent-clipped=0.0 2023-11-29 08:21:10,453 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 08:21:20,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3885253.3333333335, ans=0.0 2023-11-29 08:21:27,201 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3885320.0, ans=0.125 2023-11-29 08:21:28,652 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 5650, loss[loss=0.05156, simple_loss=0.07121, pruned_loss=0.007883, audio_tagging_loss=0.008074, over 15840.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.09082, pruned_loss=0.01192, audio_tagging_loss=0.008716, over 3055722.02 frames. ], batch size: 60, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:21:28,754 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 582800 2023-11-29 08:21:35,620 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.67 vs. limit=12.0 2023-11-29 08:21:41,259 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3885386.6666666665, ans=0.125 2023-11-29 08:21:49,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3885386.6666666665, ans=0.125 2023-11-29 08:22:23,374 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.49 vs. limit=15.0 2023-11-29 08:22:29,912 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 5700, loss[loss=0.0831, simple_loss=0.1077, pruned_loss=0.02187, audio_tagging_loss=0.007409, over 15808.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.09002, pruned_loss=0.01198, audio_tagging_loss=0.00878, over 3053127.75 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:22:30,031 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 582850 2023-11-29 08:22:43,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3885720.0, ans=0.125 2023-11-29 08:22:55,565 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=14.18 vs. limit=15.0 2023-11-29 08:22:57,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3885786.6666666665, ans=0.2 2023-11-29 08:23:10,803 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.911e+01 9.240e+01 1.005e+02 1.081e+02 1.357e+02, threshold=2.009e+02, percent-clipped=0.0 2023-11-29 08:23:13,602 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3885853.3333333335, ans=0.0 2023-11-29 08:23:19,289 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff3.min_abs, batch_count=3885920.0, ans=0.2 2023-11-29 08:23:31,319 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 5750, loss[loss=0.06, simple_loss=0.08534, pruned_loss=0.009823, audio_tagging_loss=0.007506, over 16035.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.09026, pruned_loss=0.012, audio_tagging_loss=0.008619, over 3048902.83 frames. ], batch size: 61, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:23:31,407 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 582900 2023-11-29 08:23:42,879 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 08:24:01,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3886120.0, ans=0.1 2023-11-29 08:24:08,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3886186.6666666665, ans=0.125 2023-11-29 08:24:20,305 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.30 vs. limit=15.0 2023-11-29 08:24:20,620 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.90 vs. limit=6.0 2023-11-29 08:24:22,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3886253.3333333335, ans=0.125 2023-11-29 08:24:32,455 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 5800, loss[loss=0.057, simple_loss=0.07334, pruned_loss=0.01032, audio_tagging_loss=0.01001, over 15424.00 frames. ], tot_loss[loss=0.0651, simple_loss=0.08914, pruned_loss=0.0119, audio_tagging_loss=0.008632, over 3046450.21 frames. ], batch size: 60, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:24:32,643 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 582950 2023-11-29 08:24:36,774 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.28 vs. limit=15.0 2023-11-29 08:24:37,459 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3886320.0, ans=0.0 2023-11-29 08:24:52,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3886386.6666666665, ans=0.125 2023-11-29 08:24:52,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3886386.6666666665, ans=0.0 2023-11-29 08:25:03,418 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3886453.3333333335, ans=0.125 2023-11-29 08:25:13,364 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.598e+01 9.182e+01 9.577e+01 1.050e+02 1.266e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-29 08:25:14,719 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3886520.0, ans=0.125 2023-11-29 08:25:28,324 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3886586.6666666665, ans=0.125 2023-11-29 08:25:28,529 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.66 vs. limit=10.0 2023-11-29 08:25:31,836 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.23 vs. limit=22.5 2023-11-29 08:25:33,498 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 5850, loss[loss=0.05824, simple_loss=0.08409, pruned_loss=0.009139, audio_tagging_loss=0.007056, over 15281.00 frames. ], tot_loss[loss=0.06486, simple_loss=0.08894, pruned_loss=0.0118, audio_tagging_loss=0.008594, over 3039864.80 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:25:33,575 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 583000 2023-11-29 08:25:44,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3886720.0, ans=0.125 2023-11-29 08:25:58,653 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.56 vs. limit=15.0 2023-11-29 08:26:03,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3886786.6666666665, ans=0.125 2023-11-29 08:26:11,660 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3886853.3333333335, ans=0.0 2023-11-29 08:26:22,299 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 08:26:36,832 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 5900, loss[loss=0.05383, simple_loss=0.07064, pruned_loss=0.009976, audio_tagging_loss=0.008531, over 15737.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.08988, pruned_loss=0.01183, audio_tagging_loss=0.008475, over 3053259.07 frames. ], batch size: 61, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:26:36,918 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 583050 2023-11-29 08:27:17,926 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.091e+01 9.266e+01 9.950e+01 1.077e+02 1.290e+02, threshold=1.990e+02, percent-clipped=0.0 2023-11-29 08:27:29,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3887253.3333333335, ans=0.0 2023-11-29 08:27:38,601 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 5950, loss[loss=0.06753, simple_loss=0.08957, pruned_loss=0.01654, audio_tagging_loss=0.006206, over 14108.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.09007, pruned_loss=0.01208, audio_tagging_loss=0.008416, over 3054234.67 frames. ], batch size: 52, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:27:38,685 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 583100 2023-11-29 08:27:38,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3887320.0, ans=0.125 2023-11-29 08:27:48,599 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.34 vs. limit=22.5 2023-11-29 08:27:50,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3887386.6666666665, ans=0.5 2023-11-29 08:27:54,338 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.66 vs. limit=15.0 2023-11-29 08:27:57,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3887386.6666666665, ans=0.125 2023-11-29 08:28:10,581 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3887453.3333333335, ans=0.0 2023-11-29 08:28:38,495 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 08:28:40,700 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 6000, loss[loss=0.06347, simple_loss=0.08299, pruned_loss=0.01244, audio_tagging_loss=0.009536, over 15397.00 frames. ], tot_loss[loss=0.06479, simple_loss=0.08914, pruned_loss=0.01176, audio_tagging_loss=0.008452, over 3054042.67 frames. ], batch size: 60, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 08:28:40,701 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-29 08:29:12,862 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.0067, 4.0335, 4.9155, 4.4483], device='cuda:1') 2023-11-29 08:29:20,054 INFO [train_asr.py:1267] (1/4) Epoch 49, validation: loss=0.05758, simple_loss=0.05041, pruned_loss=0.005303, audio_tagging_loss=0.02707, over 4681554.00 frames. 2023-11-29 08:29:20,055 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-29 08:29:20,133 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 583150 2023-11-29 08:29:43,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3887720.0, ans=0.125 2023-11-29 08:29:56,678 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.23 vs. limit=10.0 2023-11-29 08:29:59,898 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3887853.3333333335, ans=0.95 2023-11-29 08:30:00,789 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.749e+01 8.890e+01 9.631e+01 1.036e+02 1.251e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-29 08:30:05,507 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 08:30:06,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3887853.3333333335, ans=0.125 2023-11-29 08:30:06,505 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3887853.3333333335, ans=0.125 2023-11-29 08:30:22,502 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 6050, loss[loss=0.07166, simple_loss=0.09889, pruned_loss=0.01355, audio_tagging_loss=0.008671, over 14936.00 frames. ], tot_loss[loss=0.06515, simple_loss=0.08959, pruned_loss=0.01193, audio_tagging_loss=0.008421, over 3052654.00 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 08:30:22,582 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 583200 2023-11-29 08:30:40,930 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3888053.3333333335, ans=0.0 2023-11-29 08:30:43,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3888053.3333333335, ans=0.2 2023-11-29 08:31:02,040 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.33 vs. limit=12.0 2023-11-29 08:31:24,421 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 6100, loss[loss=0.08092, simple_loss=0.1128, pruned_loss=0.01688, audio_tagging_loss=0.007654, over 15015.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.09017, pruned_loss=0.01209, audio_tagging_loss=0.008363, over 3054623.38 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 08:31:24,567 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 583250 2023-11-29 08:31:32,979 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 08:31:55,358 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3888453.3333333335, ans=0.2 2023-11-29 08:31:58,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3888453.3333333335, ans=0.125 2023-11-29 08:32:01,573 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.90 vs. limit=15.0 2023-11-29 08:32:05,556 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.493e+01 9.225e+01 1.004e+02 1.045e+02 1.351e+02, threshold=2.008e+02, percent-clipped=0.0 2023-11-29 08:32:07,066 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3888520.0, ans=0.0 2023-11-29 08:32:13,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3888586.6666666665, ans=0.125 2023-11-29 08:32:25,412 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 6150, loss[loss=0.07386, simple_loss=0.1062, pruned_loss=0.01327, audio_tagging_loss=0.007511, over 15375.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.09003, pruned_loss=0.01211, audio_tagging_loss=0.00833, over 3058698.58 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:32:25,504 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 583300 2023-11-29 08:32:35,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3888653.3333333335, ans=0.125 2023-11-29 08:32:44,412 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.54 vs. limit=22.5 2023-11-29 08:32:51,719 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 08:33:00,179 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.11 vs. limit=22.5 2023-11-29 08:33:26,851 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 6200, loss[loss=0.08646, simple_loss=0.1168, pruned_loss=0.01758, audio_tagging_loss=0.01047, over 14922.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.08957, pruned_loss=0.01205, audio_tagging_loss=0.008421, over 3059659.04 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:33:26,967 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 583350 2023-11-29 08:33:28,996 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.09 vs. limit=15.0 2023-11-29 08:33:31,629 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.95 vs. limit=12.0 2023-11-29 08:33:39,222 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.51 vs. limit=15.0 2023-11-29 08:33:40,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3889053.3333333335, ans=0.0 2023-11-29 08:33:49,714 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.39 vs. limit=15.0 2023-11-29 08:34:08,802 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.749e+01 9.108e+01 9.848e+01 1.055e+02 1.947e+02, threshold=1.970e+02, percent-clipped=0.0 2023-11-29 08:34:29,501 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 6250, loss[loss=0.08298, simple_loss=0.1098, pruned_loss=0.0203, audio_tagging_loss=0.007796, over 14455.00 frames. ], tot_loss[loss=0.06511, simple_loss=0.089, pruned_loss=0.01205, audio_tagging_loss=0.008565, over 3058246.12 frames. ], batch size: 53, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:34:29,617 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 583400 2023-11-29 08:34:32,349 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3889320.0, ans=0.125 2023-11-29 08:34:45,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3889386.6666666665, ans=0.0 2023-11-29 08:34:53,335 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.44 vs. limit=15.0 2023-11-29 08:35:30,160 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 6300, loss[loss=0.0612, simple_loss=0.07416, pruned_loss=0.01072, audio_tagging_loss=0.0134, over 14448.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.08896, pruned_loss=0.012, audio_tagging_loss=0.008768, over 3054771.91 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:35:30,285 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 583450 2023-11-29 08:35:38,657 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.92 vs. limit=6.0 2023-11-29 08:35:48,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3889720.0, ans=0.125 2023-11-29 08:35:49,029 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 08:36:11,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3889853.3333333335, ans=0.125 2023-11-29 08:36:13,123 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.769e+01 9.155e+01 9.755e+01 1.058e+02 1.266e+02, threshold=1.951e+02, percent-clipped=0.0 2023-11-29 08:36:23,604 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3889920.0, ans=0.125 2023-11-29 08:36:32,768 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 6350, loss[loss=0.05698, simple_loss=0.07212, pruned_loss=0.009274, audio_tagging_loss=0.01164, over 15963.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08935, pruned_loss=0.01211, audio_tagging_loss=0.008842, over 3049785.60 frames. ], batch size: 61, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:36:32,888 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 583500 2023-11-29 08:36:34,237 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3889986.6666666665, ans=0.07 2023-11-29 08:36:53,368 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.97 vs. limit=10.0 2023-11-29 08:37:03,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3890120.0, ans=0.125 2023-11-29 08:37:07,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3890120.0, ans=0.0 2023-11-29 08:37:15,730 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3890186.6666666665, ans=0.1 2023-11-29 08:37:36,013 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 6400, loss[loss=0.05538, simple_loss=0.06673, pruned_loss=0.01033, audio_tagging_loss=0.01168, over 15991.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.08925, pruned_loss=0.01194, audio_tagging_loss=0.00887, over 3056050.23 frames. ], batch size: 63, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 08:37:36,113 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 583550 2023-11-29 08:37:38,595 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3890320.0, ans=0.125 2023-11-29 08:37:41,105 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.99 vs. limit=6.0 2023-11-29 08:37:54,981 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3890386.6666666665, ans=0.0 2023-11-29 08:37:57,156 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.83 vs. limit=22.5 2023-11-29 08:37:57,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3890386.6666666665, ans=0.125 2023-11-29 08:38:00,558 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.32 vs. limit=15.0 2023-11-29 08:38:13,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3890520.0, ans=0.0 2023-11-29 08:38:17,391 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.637e+01 8.981e+01 9.586e+01 1.023e+02 1.257e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-29 08:38:36,782 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 6450, loss[loss=0.06499, simple_loss=0.08256, pruned_loss=0.01211, audio_tagging_loss=0.01159, over 14930.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.08965, pruned_loss=0.012, audio_tagging_loss=0.008927, over 3056381.19 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 08:38:36,903 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 583600 2023-11-29 08:39:18,007 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.70 vs. limit=10.0 2023-11-29 08:39:39,032 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 6500, loss[loss=0.06145, simple_loss=0.08996, pruned_loss=0.01022, audio_tagging_loss=0.006254, over 16380.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08953, pruned_loss=0.01194, audio_tagging_loss=0.00888, over 3056657.87 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 08:39:39,125 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 583650 2023-11-29 08:39:40,556 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3890986.6666666665, ans=0.125 2023-11-29 08:39:40,724 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.97 vs. limit=15.0 2023-11-29 08:40:13,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3891120.0, ans=0.125 2023-11-29 08:40:22,314 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.191e+01 9.311e+01 1.000e+02 1.077e+02 1.349e+02, threshold=2.001e+02, percent-clipped=0.0 2023-11-29 08:40:25,392 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.49 vs. limit=22.5 2023-11-29 08:40:41,261 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 6550, loss[loss=0.0542, simple_loss=0.07582, pruned_loss=0.00999, audio_tagging_loss=0.006301, over 14836.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.08893, pruned_loss=0.01186, audio_tagging_loss=0.008799, over 3052048.64 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:40:41,350 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 583700 2023-11-29 08:40:51,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3891320.0, ans=0.125 2023-11-29 08:41:01,178 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3891386.6666666665, ans=0.2 2023-11-29 08:41:29,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3891586.6666666665, ans=0.125 2023-11-29 08:41:43,074 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 6600, loss[loss=0.07409, simple_loss=0.106, pruned_loss=0.01513, audio_tagging_loss=0.005977, over 14578.00 frames. ], tot_loss[loss=0.06488, simple_loss=0.0886, pruned_loss=0.0119, audio_tagging_loss=0.008683, over 3054005.01 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:41:43,166 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 583750 2023-11-29 08:41:50,521 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3891653.3333333335, ans=0.0 2023-11-29 08:41:50,541 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3891653.3333333335, ans=0.125 2023-11-29 08:41:52,373 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3891653.3333333335, ans=0.125 2023-11-29 08:42:03,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3891720.0, ans=0.0 2023-11-29 08:42:16,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3891786.6666666665, ans=0.025 2023-11-29 08:42:26,994 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.383e+01 9.392e+01 1.005e+02 1.057e+02 1.408e+02, threshold=2.010e+02, percent-clipped=0.0 2023-11-29 08:42:45,095 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 6650, loss[loss=0.08252, simple_loss=0.1081, pruned_loss=0.0199, audio_tagging_loss=0.008578, over 15789.00 frames. ], tot_loss[loss=0.06475, simple_loss=0.08855, pruned_loss=0.01186, audio_tagging_loss=0.008618, over 3060883.30 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:42:45,231 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 583800 2023-11-29 08:43:00,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3892053.3333333335, ans=0.0 2023-11-29 08:43:05,915 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3892053.3333333335, ans=0.0 2023-11-29 08:43:31,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3892186.6666666665, ans=0.09899494936611666 2023-11-29 08:43:35,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3892253.3333333335, ans=0.0 2023-11-29 08:43:48,138 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 6700, loss[loss=0.06362, simple_loss=0.09041, pruned_loss=0.01105, audio_tagging_loss=0.007365, over 14813.00 frames. ], tot_loss[loss=0.06479, simple_loss=0.08854, pruned_loss=0.01195, audio_tagging_loss=0.008575, over 3048767.16 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:43:48,221 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 583850 2023-11-29 08:44:05,353 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3892386.6666666665, ans=0.1 2023-11-29 08:44:31,108 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.872e+01 9.029e+01 9.606e+01 1.021e+02 1.283e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-29 08:44:41,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3892586.6666666665, ans=0.0 2023-11-29 08:44:49,282 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 6750, loss[loss=0.06501, simple_loss=0.09695, pruned_loss=0.009998, audio_tagging_loss=0.006539, over 14685.00 frames. ], tot_loss[loss=0.06442, simple_loss=0.08828, pruned_loss=0.01174, audio_tagging_loss=0.008549, over 3046625.98 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:44:49,406 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 583900 2023-11-29 08:44:55,374 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.57 vs. limit=15.0 2023-11-29 08:45:04,674 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3892720.0, ans=0.125 2023-11-29 08:45:26,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3892853.3333333335, ans=0.015 2023-11-29 08:45:26,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3892853.3333333335, ans=0.125 2023-11-29 08:45:51,280 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 6800, loss[loss=0.05869, simple_loss=0.08224, pruned_loss=0.01156, audio_tagging_loss=0.006011, over 14818.00 frames. ], tot_loss[loss=0.06443, simple_loss=0.08798, pruned_loss=0.01183, audio_tagging_loss=0.008607, over 3044594.99 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 08:45:51,359 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 583950 2023-11-29 08:46:34,383 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.048e+01 9.158e+01 9.734e+01 1.076e+02 1.387e+02, threshold=1.947e+02, percent-clipped=0.0 2023-11-29 08:46:44,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3893253.3333333335, ans=0.2 2023-11-29 08:46:48,151 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3893253.3333333335, ans=0.0 2023-11-29 08:46:50,592 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3893253.3333333335, ans=0.95 2023-11-29 08:46:53,210 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 6850, loss[loss=0.08502, simple_loss=0.1147, pruned_loss=0.0178, audio_tagging_loss=0.009853, over 14676.00 frames. ], tot_loss[loss=0.06436, simple_loss=0.08787, pruned_loss=0.01178, audio_tagging_loss=0.008643, over 3042266.82 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 08:46:53,296 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 584000 2023-11-29 08:47:11,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3893386.6666666665, ans=0.125 2023-11-29 08:47:11,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3893386.6666666665, ans=0.1 2023-11-29 08:47:15,189 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3893386.6666666665, ans=0.125 2023-11-29 08:47:18,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3893386.6666666665, ans=0.0 2023-11-29 08:47:23,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3893453.3333333335, ans=0.125 2023-11-29 08:47:30,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3893453.3333333335, ans=0.0 2023-11-29 08:47:45,092 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3893586.6666666665, ans=0.125 2023-11-29 08:47:56,374 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 6900, loss[loss=0.07668, simple_loss=0.1055, pruned_loss=0.01796, audio_tagging_loss=0.005994, over 14998.00 frames. ], tot_loss[loss=0.06441, simple_loss=0.08802, pruned_loss=0.01184, audio_tagging_loss=0.008561, over 3046755.35 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 08:47:56,485 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 584050 2023-11-29 08:48:15,655 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 08:48:36,250 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3893853.3333333335, ans=0.125 2023-11-29 08:48:39,324 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.820e+01 9.126e+01 9.494e+01 1.002e+02 1.227e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-29 08:48:39,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3893853.3333333335, ans=0.0 2023-11-29 08:48:43,250 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 08:48:45,226 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 08:48:55,518 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3893920.0, ans=0.1 2023-11-29 08:48:57,354 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3893986.6666666665, ans=0.125 2023-11-29 08:48:58,289 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 6950, loss[loss=0.06644, simple_loss=0.1006, pruned_loss=0.01144, audio_tagging_loss=0.004684, over 14989.00 frames. ], tot_loss[loss=0.06428, simple_loss=0.08816, pruned_loss=0.01171, audio_tagging_loss=0.008486, over 3045406.43 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 08:48:58,397 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 584100 2023-11-29 08:49:26,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3894120.0, ans=0.0 2023-11-29 08:49:28,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3894120.0, ans=0.1 2023-11-29 08:49:29,453 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.49 vs. limit=10.0 2023-11-29 08:49:34,578 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3894186.6666666665, ans=0.125 2023-11-29 08:49:51,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3894253.3333333335, ans=0.125 2023-11-29 08:49:58,686 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 7000, loss[loss=0.07024, simple_loss=0.09596, pruned_loss=0.01279, audio_tagging_loss=0.009474, over 14681.00 frames. ], tot_loss[loss=0.06423, simple_loss=0.08801, pruned_loss=0.01168, audio_tagging_loss=0.008537, over 3045392.02 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 08:49:58,790 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 584150 2023-11-29 08:50:09,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3894386.6666666665, ans=0.0 2023-11-29 08:50:10,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3894386.6666666665, ans=0.0 2023-11-29 08:50:12,692 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.77 vs. limit=12.0 2023-11-29 08:50:13,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3894386.6666666665, ans=0.125 2023-11-29 08:50:21,049 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3894386.6666666665, ans=0.125 2023-11-29 08:50:22,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3894453.3333333335, ans=0.0 2023-11-29 08:50:42,683 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.579e+01 8.892e+01 9.505e+01 1.031e+02 1.228e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-29 08:51:01,065 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 7050, loss[loss=0.06773, simple_loss=0.09472, pruned_loss=0.01399, audio_tagging_loss=0.00639, over 15368.00 frames. ], tot_loss[loss=0.06406, simple_loss=0.08752, pruned_loss=0.01168, audio_tagging_loss=0.008614, over 3050708.89 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 08:51:01,150 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 584200 2023-11-29 08:51:27,084 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3894786.6666666665, ans=10.0 2023-11-29 08:51:31,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3894786.6666666665, ans=0.125 2023-11-29 08:51:37,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3894853.3333333335, ans=0.125 2023-11-29 08:51:46,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3894853.3333333335, ans=0.09899494936611666 2023-11-29 08:51:46,485 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3894853.3333333335, ans=0.2 2023-11-29 08:52:01,495 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 7100, loss[loss=0.07854, simple_loss=0.1124, pruned_loss=0.01394, audio_tagging_loss=0.008413, over 15754.00 frames. ], tot_loss[loss=0.06454, simple_loss=0.08835, pruned_loss=0.0118, audio_tagging_loss=0.008555, over 3050383.63 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 08:52:01,597 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 584250 2023-11-29 08:52:12,588 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.96 vs. limit=22.5 2023-11-29 08:52:13,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3895053.3333333335, ans=0.09899494936611666 2023-11-29 08:52:19,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3895053.3333333335, ans=0.125 2023-11-29 08:52:21,052 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3895053.3333333335, ans=0.0 2023-11-29 08:52:26,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3895120.0, ans=0.125 2023-11-29 08:52:45,228 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.440e+01 9.114e+01 9.630e+01 1.033e+02 1.554e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-29 08:52:50,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=3895253.3333333335, ans=0.025 2023-11-29 08:53:03,109 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 7150, loss[loss=0.06502, simple_loss=0.09307, pruned_loss=0.01273, audio_tagging_loss=0.005755, over 14976.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.08883, pruned_loss=0.01191, audio_tagging_loss=0.008582, over 3047921.39 frames. ], batch size: 54, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 08:53:03,206 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 584300 2023-11-29 08:53:03,319 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3895320.0, ans=0.5 2023-11-29 08:53:12,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3895320.0, ans=0.0 2023-11-29 08:53:13,427 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3895320.0, ans=0.0 2023-11-29 08:53:26,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3895453.3333333335, ans=0.1 2023-11-29 08:53:28,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3895453.3333333335, ans=0.125 2023-11-29 08:53:41,922 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3895520.0, ans=0.125 2023-11-29 08:53:48,151 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.52 vs. limit=10.0 2023-11-29 08:53:57,337 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.86 vs. limit=15.0 2023-11-29 08:54:04,780 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 7200, loss[loss=0.05384, simple_loss=0.06925, pruned_loss=0.007738, audio_tagging_loss=0.01148, over 16528.00 frames. ], tot_loss[loss=0.0649, simple_loss=0.08881, pruned_loss=0.01181, audio_tagging_loss=0.008692, over 3048265.90 frames. ], batch size: 66, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 08:54:04,903 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 584350 2023-11-29 08:54:21,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3895720.0, ans=0.0 2023-11-29 08:54:24,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3895720.0, ans=0.0 2023-11-29 08:54:37,907 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3895786.6666666665, ans=0.0 2023-11-29 08:54:48,870 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.750e+01 8.925e+01 9.835e+01 1.040e+02 1.813e+02, threshold=1.967e+02, percent-clipped=0.0 2023-11-29 08:55:04,720 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.71 vs. limit=10.0 2023-11-29 08:55:05,310 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 7250, loss[loss=0.06807, simple_loss=0.0984, pruned_loss=0.0078, audio_tagging_loss=0.01107, over 15542.00 frames. ], tot_loss[loss=0.06492, simple_loss=0.08879, pruned_loss=0.01183, audio_tagging_loss=0.0087, over 3046987.32 frames. ], batch size: 58, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 08:55:05,409 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 584400 2023-11-29 08:55:32,854 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.14 vs. limit=6.0 2023-11-29 08:55:43,348 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3896186.6666666665, ans=0.1 2023-11-29 08:55:44,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3896186.6666666665, ans=0.125 2023-11-29 08:55:51,850 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.71 vs. limit=12.0 2023-11-29 08:56:07,929 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 7300, loss[loss=0.07475, simple_loss=0.09824, pruned_loss=0.01636, audio_tagging_loss=0.009277, over 16147.00 frames. ], tot_loss[loss=0.06478, simple_loss=0.08876, pruned_loss=0.01175, audio_tagging_loss=0.008654, over 3040787.15 frames. ], batch size: 60, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 08:56:08,012 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 584450 2023-11-29 08:56:08,503 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.62 vs. limit=15.0 2023-11-29 08:56:21,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3896386.6666666665, ans=0.125 2023-11-29 08:56:22,036 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.06 vs. limit=15.0 2023-11-29 08:56:35,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3896453.3333333335, ans=0.125 2023-11-29 08:56:51,121 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.824e+01 9.158e+01 9.688e+01 1.038e+02 1.242e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-29 08:56:59,254 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3896586.6666666665, ans=0.2 2023-11-29 08:57:07,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3896586.6666666665, ans=0.125 2023-11-29 08:57:08,974 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 7350, loss[loss=0.05676, simple_loss=0.07909, pruned_loss=0.009514, audio_tagging_loss=0.007702, over 15446.00 frames. ], tot_loss[loss=0.0642, simple_loss=0.08814, pruned_loss=0.01155, audio_tagging_loss=0.008584, over 3037214.34 frames. ], batch size: 61, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 08:57:09,088 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 584500 2023-11-29 08:57:24,816 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3896720.0, ans=0.1 2023-11-29 08:57:38,158 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 08:57:58,532 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.74 vs. limit=15.0 2023-11-29 08:58:09,724 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 7400, loss[loss=0.07193, simple_loss=0.09708, pruned_loss=0.01419, audio_tagging_loss=0.009198, over 15272.00 frames. ], tot_loss[loss=0.06464, simple_loss=0.08885, pruned_loss=0.01172, audio_tagging_loss=0.00849, over 3041674.84 frames. ], batch size: 55, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 08:58:09,835 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 584550 2023-11-29 08:58:29,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3897053.3333333335, ans=0.0 2023-11-29 08:58:54,834 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.663e+01 9.320e+01 9.934e+01 1.095e+02 1.320e+02, threshold=1.987e+02, percent-clipped=0.0 2023-11-29 08:58:59,665 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3897253.3333333335, ans=0.0 2023-11-29 08:59:01,233 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.92 vs. limit=22.5 2023-11-29 08:59:10,447 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 7450, loss[loss=0.06419, simple_loss=0.09017, pruned_loss=0.01082, audio_tagging_loss=0.008282, over 15679.00 frames. ], tot_loss[loss=0.065, simple_loss=0.08946, pruned_loss=0.0118, audio_tagging_loss=0.00847, over 3040501.84 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 08:59:10,576 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 584600 2023-11-29 08:59:27,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3897386.6666666665, ans=0.0 2023-11-29 08:59:37,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3897453.3333333335, ans=0.0 2023-11-29 08:59:50,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3897520.0, ans=0.125 2023-11-29 09:00:11,644 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 7500, loss[loss=0.04886, simple_loss=0.07187, pruned_loss=0.006813, audio_tagging_loss=0.006105, over 15141.00 frames. ], tot_loss[loss=0.06492, simple_loss=0.08944, pruned_loss=0.01181, audio_tagging_loss=0.008396, over 3039246.61 frames. ], batch size: 56, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:00:11,759 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 584650 2023-11-29 09:00:21,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3897653.3333333335, ans=0.125 2023-11-29 09:00:22,827 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3897720.0, ans=0.2 2023-11-29 09:00:27,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3897720.0, ans=0.125 2023-11-29 09:00:33,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3897720.0, ans=0.0 2023-11-29 09:00:35,725 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.80 vs. limit=12.0 2023-11-29 09:00:36,423 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3897786.6666666665, ans=0.125 2023-11-29 09:00:48,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3897853.3333333335, ans=0.125 2023-11-29 09:00:48,473 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.35 vs. limit=10.0 2023-11-29 09:00:50,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3897853.3333333335, ans=0.2 2023-11-29 09:00:57,146 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.707e+01 9.073e+01 9.674e+01 1.060e+02 1.310e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-29 09:01:12,425 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 7550, loss[loss=0.03605, simple_loss=0.04076, pruned_loss=0.004162, audio_tagging_loss=0.01151, over 16319.00 frames. ], tot_loss[loss=0.06502, simple_loss=0.08933, pruned_loss=0.01196, audio_tagging_loss=0.008398, over 3048594.42 frames. ], batch size: 64, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:01:12,571 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 584700 2023-11-29 09:01:30,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3898053.3333333335, ans=0.125 2023-11-29 09:01:30,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3898053.3333333335, ans=0.1 2023-11-29 09:01:41,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3898120.0, ans=0.125 2023-11-29 09:02:07,738 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3898253.3333333335, ans=0.125 2023-11-29 09:02:09,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3898253.3333333335, ans=0.125 2023-11-29 09:02:11,745 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3898253.3333333335, ans=0.125 2023-11-29 09:02:13,855 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 7600, loss[loss=0.07266, simple_loss=0.1063, pruned_loss=0.01321, audio_tagging_loss=0.006275, over 15975.00 frames. ], tot_loss[loss=0.06482, simple_loss=0.08903, pruned_loss=0.01191, audio_tagging_loss=0.008396, over 3050367.45 frames. ], batch size: 60, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:02:13,967 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 584750 2023-11-29 09:02:23,848 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.86 vs. limit=22.5 2023-11-29 09:02:33,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3898386.6666666665, ans=0.0 2023-11-29 09:02:40,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3898453.3333333335, ans=0.1 2023-11-29 09:02:50,804 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.15 vs. limit=15.0 2023-11-29 09:02:52,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3898520.0, ans=0.125 2023-11-29 09:02:52,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3898520.0, ans=0.2 2023-11-29 09:02:56,019 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.83 vs. limit=15.0 2023-11-29 09:03:00,044 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.761e+01 9.018e+01 9.691e+01 1.036e+02 1.365e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-29 09:03:08,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3898586.6666666665, ans=0.0 2023-11-29 09:03:16,483 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 7650, loss[loss=0.07573, simple_loss=0.1065, pruned_loss=0.01352, audio_tagging_loss=0.008944, over 15005.00 frames. ], tot_loss[loss=0.06437, simple_loss=0.08839, pruned_loss=0.01174, audio_tagging_loss=0.008435, over 3049899.43 frames. ], batch size: 55, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:03:16,573 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 584800 2023-11-29 09:03:41,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3898786.6666666665, ans=0.125 2023-11-29 09:03:44,138 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3898786.6666666665, ans=0.0 2023-11-29 09:03:46,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3898786.6666666665, ans=0.125 2023-11-29 09:03:50,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3898786.6666666665, ans=0.0 2023-11-29 09:04:18,692 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 7700, loss[loss=0.0631, simple_loss=0.08148, pruned_loss=0.01188, audio_tagging_loss=0.01048, over 15045.00 frames. ], tot_loss[loss=0.06464, simple_loss=0.08896, pruned_loss=0.01171, audio_tagging_loss=0.00845, over 3052611.51 frames. ], batch size: 55, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:04:18,820 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 584850 2023-11-29 09:05:05,206 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.265e+01 9.378e+01 9.780e+01 1.035e+02 1.508e+02, threshold=1.956e+02, percent-clipped=0.0 2023-11-29 09:05:19,943 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 7750, loss[loss=0.05512, simple_loss=0.07906, pruned_loss=0.009095, audio_tagging_loss=0.006492, over 15046.00 frames. ], tot_loss[loss=0.06438, simple_loss=0.08811, pruned_loss=0.0117, audio_tagging_loss=0.008622, over 3045395.74 frames. ], batch size: 56, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:05:20,031 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 584900 2023-11-29 09:05:31,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3899386.6666666665, ans=0.125 2023-11-29 09:05:35,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3899386.6666666665, ans=0.0 2023-11-29 09:05:59,797 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.88 vs. limit=15.0 2023-11-29 09:06:13,305 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.16 vs. limit=15.0 2023-11-29 09:06:15,597 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.37 vs. limit=15.0 2023-11-29 09:06:21,813 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 7800, loss[loss=0.04733, simple_loss=0.05662, pruned_loss=0.006364, audio_tagging_loss=0.01265, over 14844.00 frames. ], tot_loss[loss=0.06505, simple_loss=0.08901, pruned_loss=0.01195, audio_tagging_loss=0.008597, over 3050837.73 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:06:21,915 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 584950 2023-11-29 09:06:48,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3899786.6666666665, ans=0.0 2023-11-29 09:06:50,695 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.25 vs. limit=15.0 2023-11-29 09:07:08,639 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.990e+01 9.073e+01 9.693e+01 1.045e+02 1.224e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-29 09:07:23,387 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 7850, loss[loss=0.07204, simple_loss=0.09604, pruned_loss=0.01677, audio_tagging_loss=0.007255, over 14073.00 frames. ], tot_loss[loss=0.0651, simple_loss=0.08912, pruned_loss=0.01192, audio_tagging_loss=0.008616, over 3056461.39 frames. ], batch size: 53, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:07:23,499 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 585000 2023-11-29 09:07:30,235 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.93 vs. limit=15.0 2023-11-29 09:07:35,640 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3900053.3333333335, ans=0.125 2023-11-29 09:07:44,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3900053.3333333335, ans=0.125 2023-11-29 09:07:54,087 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.51 vs. limit=22.5 2023-11-29 09:08:02,344 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3900186.6666666665, ans=0.1 2023-11-29 09:08:24,385 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 7900, loss[loss=0.06319, simple_loss=0.09043, pruned_loss=0.01049, audio_tagging_loss=0.007486, over 15304.00 frames. ], tot_loss[loss=0.06473, simple_loss=0.08843, pruned_loss=0.0118, audio_tagging_loss=0.008716, over 3053117.63 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:08:24,473 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 585050 2023-11-29 09:08:26,369 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.56 vs. limit=15.0 2023-11-29 09:08:40,230 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.01 vs. limit=12.0 2023-11-29 09:08:48,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3900453.3333333335, ans=0.125 2023-11-29 09:08:52,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3900453.3333333335, ans=0.125 2023-11-29 09:08:57,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3900453.3333333335, ans=0.125 2023-11-29 09:08:57,612 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.95 vs. limit=22.5 2023-11-29 09:09:10,031 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.168e+01 9.323e+01 9.871e+01 1.069e+02 1.326e+02, threshold=1.974e+02, percent-clipped=0.0 2023-11-29 09:09:23,934 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 7950, loss[loss=0.06601, simple_loss=0.09367, pruned_loss=0.0127, audio_tagging_loss=0.006479, over 14520.00 frames. ], tot_loss[loss=0.0649, simple_loss=0.08858, pruned_loss=0.01184, audio_tagging_loss=0.008777, over 3050433.23 frames. ], batch size: 56, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:09:24,019 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 585100 2023-11-29 09:09:24,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3900653.3333333335, ans=0.125 2023-11-29 09:09:40,464 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 09:09:52,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3900786.6666666665, ans=0.125 2023-11-29 09:10:07,049 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 09:10:18,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3900920.0, ans=0.125 2023-11-29 09:10:18,051 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3900920.0, ans=0.125 2023-11-29 09:10:24,888 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 8000, loss[loss=0.08331, simple_loss=0.1167, pruned_loss=0.01806, audio_tagging_loss=0.006889, over 14626.00 frames. ], tot_loss[loss=0.06505, simple_loss=0.0884, pruned_loss=0.01197, audio_tagging_loss=0.008879, over 3048468.28 frames. ], batch size: 55, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:10:24,972 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 585150 2023-11-29 09:10:31,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3900986.6666666665, ans=0.2 2023-11-29 09:10:33,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3900986.6666666665, ans=0.04949747468305833 2023-11-29 09:11:01,914 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff3.min_abs, batch_count=3901186.6666666665, ans=0.2 2023-11-29 09:11:11,216 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.479e+01 9.013e+01 9.705e+01 1.030e+02 1.171e+02, threshold=1.941e+02, percent-clipped=0.0 2023-11-29 09:11:25,773 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 8050, loss[loss=0.06472, simple_loss=0.0844, pruned_loss=0.01482, audio_tagging_loss=0.007701, over 13633.00 frames. ], tot_loss[loss=0.06465, simple_loss=0.08793, pruned_loss=0.01177, audio_tagging_loss=0.008914, over 3045688.83 frames. ], batch size: 54, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:11:25,890 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 585200 2023-11-29 09:11:42,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3901386.6666666665, ans=0.2 2023-11-29 09:12:02,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3901520.0, ans=0.125 2023-11-29 09:12:03,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3901520.0, ans=0.1 2023-11-29 09:12:13,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3901520.0, ans=0.1 2023-11-29 09:12:15,296 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3901586.6666666665, ans=0.0 2023-11-29 09:12:24,693 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3901586.6666666665, ans=0.1 2023-11-29 09:12:26,122 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3901586.6666666665, ans=0.5 2023-11-29 09:12:27,080 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3901653.3333333335, ans=0.5 2023-11-29 09:12:28,020 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 8100, loss[loss=0.06221, simple_loss=0.08763, pruned_loss=0.009129, audio_tagging_loss=0.009262, over 15543.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.08856, pruned_loss=0.01185, audio_tagging_loss=0.008779, over 3046166.40 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:12:28,111 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 585250 2023-11-29 09:12:33,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3901653.3333333335, ans=0.0 2023-11-29 09:12:45,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3901720.0, ans=0.125 2023-11-29 09:12:54,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3901786.6666666665, ans=0.09899494936611666 2023-11-29 09:13:16,474 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.652e+01 9.257e+01 9.923e+01 1.057e+02 1.359e+02, threshold=1.985e+02, percent-clipped=0.0 2023-11-29 09:13:19,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3901920.0, ans=0.125 2023-11-29 09:13:29,950 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 8150, loss[loss=0.07466, simple_loss=0.1055, pruned_loss=0.01534, audio_tagging_loss=0.006595, over 16437.00 frames. ], tot_loss[loss=0.06433, simple_loss=0.08812, pruned_loss=0.01162, audio_tagging_loss=0.008645, over 3044713.06 frames. ], batch size: 62, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:13:30,031 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 585300 2023-11-29 09:13:30,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3901986.6666666665, ans=0.0 2023-11-29 09:14:15,948 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3902186.6666666665, ans=0.1 2023-11-29 09:14:18,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3902253.3333333335, ans=0.09899494936611666 2023-11-29 09:14:25,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3902253.3333333335, ans=0.125 2023-11-29 09:14:27,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3902253.3333333335, ans=0.125 2023-11-29 09:14:31,349 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 8200, loss[loss=0.06575, simple_loss=0.0861, pruned_loss=0.009758, audio_tagging_loss=0.01294, over 15673.00 frames. ], tot_loss[loss=0.06443, simple_loss=0.08822, pruned_loss=0.01173, audio_tagging_loss=0.008583, over 3046280.64 frames. ], batch size: 58, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:14:31,463 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 585350 2023-11-29 09:14:33,666 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 09:14:34,941 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.44 vs. limit=12.0 2023-11-29 09:14:45,447 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.35 vs. limit=10.0 2023-11-29 09:15:05,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3902453.3333333335, ans=0.2 2023-11-29 09:15:07,901 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3902520.0, ans=0.1 2023-11-29 09:15:08,115 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.69 vs. limit=10.0 2023-11-29 09:15:17,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3902520.0, ans=0.0 2023-11-29 09:15:18,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3902520.0, ans=0.125 2023-11-29 09:15:19,729 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.775e+01 9.273e+01 9.882e+01 1.047e+02 1.240e+02, threshold=1.976e+02, percent-clipped=0.0 2023-11-29 09:15:34,287 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 8250, loss[loss=0.04442, simple_loss=0.05654, pruned_loss=0.007331, audio_tagging_loss=0.008816, over 15665.00 frames. ], tot_loss[loss=0.06484, simple_loss=0.08913, pruned_loss=0.01179, audio_tagging_loss=0.008486, over 3056085.27 frames. ], batch size: 60, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:15:34,409 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 585400 2023-11-29 09:15:38,517 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=3902653.3333333335, ans=10.0 2023-11-29 09:15:48,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3902720.0, ans=0.125 2023-11-29 09:15:57,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3902720.0, ans=10.0 2023-11-29 09:15:58,167 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.90 vs. limit=6.0 2023-11-29 09:16:14,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3902853.3333333335, ans=0.2 2023-11-29 09:16:22,239 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.78 vs. limit=15.0 2023-11-29 09:16:29,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3902920.0, ans=0.2 2023-11-29 09:16:34,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3902920.0, ans=0.125 2023-11-29 09:16:34,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3902920.0, ans=0.0 2023-11-29 09:16:36,689 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 8300, loss[loss=0.0731, simple_loss=0.09476, pruned_loss=0.01764, audio_tagging_loss=0.008077, over 15272.00 frames. ], tot_loss[loss=0.0647, simple_loss=0.08889, pruned_loss=0.01174, audio_tagging_loss=0.008521, over 3058661.71 frames. ], batch size: 55, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:16:36,806 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 585450 2023-11-29 09:16:40,922 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.66 vs. limit=15.0 2023-11-29 09:16:48,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3903053.3333333335, ans=0.125 2023-11-29 09:17:09,892 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 09:17:24,339 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.794e+01 8.975e+01 9.727e+01 1.046e+02 1.425e+02, threshold=1.945e+02, percent-clipped=0.0 2023-11-29 09:17:28,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3903253.3333333335, ans=0.125 2023-11-29 09:17:37,225 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 8350, loss[loss=0.06866, simple_loss=0.09369, pruned_loss=0.01337, audio_tagging_loss=0.008453, over 15491.00 frames. ], tot_loss[loss=0.06472, simple_loss=0.08894, pruned_loss=0.01179, audio_tagging_loss=0.008465, over 3059410.24 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:17:37,306 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 585500 2023-11-29 09:17:38,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3903320.0, ans=0.0 2023-11-29 09:17:39,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3903320.0, ans=0.125 2023-11-29 09:17:44,194 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3903320.0, ans=0.0 2023-11-29 09:17:47,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3903320.0, ans=0.2 2023-11-29 09:17:48,493 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.95 vs. limit=15.0 2023-11-29 09:18:11,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3903453.3333333335, ans=0.125 2023-11-29 09:18:27,917 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3903586.6666666665, ans=0.1 2023-11-29 09:18:33,012 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3903586.6666666665, ans=0.0 2023-11-29 09:18:39,266 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 8400, loss[loss=0.08187, simple_loss=0.128, pruned_loss=0.01339, audio_tagging_loss=0.004474, over 15181.00 frames. ], tot_loss[loss=0.06452, simple_loss=0.08896, pruned_loss=0.01165, audio_tagging_loss=0.008396, over 3061732.10 frames. ], batch size: 55, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:18:39,346 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 585550 2023-11-29 09:18:47,640 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3903653.3333333335, ans=0.035 2023-11-29 09:18:47,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3903653.3333333335, ans=0.0 2023-11-29 09:19:02,100 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.62 vs. limit=6.0 2023-11-29 09:19:12,582 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3903786.6666666665, ans=0.2 2023-11-29 09:19:28,572 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.574e+01 8.927e+01 9.448e+01 1.050e+02 1.277e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-29 09:19:41,545 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 8450, loss[loss=0.05258, simple_loss=0.07197, pruned_loss=0.008945, audio_tagging_loss=0.007651, over 16034.00 frames. ], tot_loss[loss=0.06412, simple_loss=0.08826, pruned_loss=0.01151, audio_tagging_loss=0.00848, over 3063442.71 frames. ], batch size: 61, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:19:41,648 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 585600 2023-11-29 09:19:48,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3903986.6666666665, ans=0.125 2023-11-29 09:19:48,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3903986.6666666665, ans=0.2 2023-11-29 09:20:07,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3904120.0, ans=0.0 2023-11-29 09:20:32,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3904253.3333333335, ans=0.125 2023-11-29 09:20:42,683 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 8500, loss[loss=0.05566, simple_loss=0.07695, pruned_loss=0.009811, audio_tagging_loss=0.007375, over 15480.00 frames. ], tot_loss[loss=0.06451, simple_loss=0.08881, pruned_loss=0.0117, audio_tagging_loss=0.008404, over 3060054.22 frames. ], batch size: 59, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:20:42,787 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 585650 2023-11-29 09:21:05,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=3904386.6666666665, ans=0.5 2023-11-29 09:21:05,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3904386.6666666665, ans=0.0 2023-11-29 09:21:17,138 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3904453.3333333335, ans=0.125 2023-11-29 09:21:31,219 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.32 vs. limit=22.5 2023-11-29 09:21:31,803 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.672e+01 9.118e+01 9.879e+01 1.041e+02 1.425e+02, threshold=1.976e+02, percent-clipped=0.0 2023-11-29 09:21:40,044 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.81 vs. limit=15.0 2023-11-29 09:21:44,140 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 8550, loss[loss=0.0567, simple_loss=0.08036, pruned_loss=0.00875, audio_tagging_loss=0.007764, over 14777.00 frames. ], tot_loss[loss=0.06401, simple_loss=0.08813, pruned_loss=0.01149, audio_tagging_loss=0.008458, over 3060221.11 frames. ], batch size: 56, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:21:44,245 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 585700 2023-11-29 09:21:45,562 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3904653.3333333335, ans=0.0 2023-11-29 09:21:46,863 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3904653.3333333335, ans=0.1 2023-11-29 09:21:50,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3904653.3333333335, ans=0.125 2023-11-29 09:22:20,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3904853.3333333335, ans=0.0 2023-11-29 09:22:45,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3904986.6666666665, ans=0.0 2023-11-29 09:22:46,525 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 8600, loss[loss=0.06928, simple_loss=0.09462, pruned_loss=0.01464, audio_tagging_loss=0.007335, over 14940.00 frames. ], tot_loss[loss=0.06474, simple_loss=0.08945, pruned_loss=0.01164, audio_tagging_loss=0.008366, over 3054970.65 frames. ], batch size: 56, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:22:46,614 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 585750 2023-11-29 09:22:51,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3904986.6666666665, ans=0.0 2023-11-29 09:22:51,797 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.46 vs. limit=6.0 2023-11-29 09:23:27,371 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.65 vs. limit=22.5 2023-11-29 09:23:34,216 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.20 vs. limit=15.0 2023-11-29 09:23:36,042 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.354e+01 9.131e+01 9.509e+01 1.045e+02 1.246e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-29 09:23:46,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3905320.0, ans=0.0 2023-11-29 09:23:47,821 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 8650, loss[loss=0.04949, simple_loss=0.07055, pruned_loss=0.005017, audio_tagging_loss=0.009196, over 15118.00 frames. ], tot_loss[loss=0.06509, simple_loss=0.08985, pruned_loss=0.01169, audio_tagging_loss=0.008475, over 3056536.65 frames. ], batch size: 58, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:23:47,911 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 585800 2023-11-29 09:23:53,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3905320.0, ans=0.0 2023-11-29 09:24:22,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=3905453.3333333335, ans=0.95 2023-11-29 09:24:29,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3905520.0, ans=0.1 2023-11-29 09:24:45,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=3905586.6666666665, ans=10.0 2023-11-29 09:24:49,332 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 8700, loss[loss=0.08136, simple_loss=0.1179, pruned_loss=0.01595, audio_tagging_loss=0.006437, over 14996.00 frames. ], tot_loss[loss=0.06487, simple_loss=0.0895, pruned_loss=0.01164, audio_tagging_loss=0.008481, over 3054175.56 frames. ], batch size: 55, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:24:49,423 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 585850 2023-11-29 09:25:24,544 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.66 vs. limit=15.0 2023-11-29 09:25:31,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3905853.3333333335, ans=0.1 2023-11-29 09:25:37,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3905920.0, ans=0.0 2023-11-29 09:25:38,428 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.006e+01 9.124e+01 9.803e+01 1.053e+02 1.295e+02, threshold=1.961e+02, percent-clipped=0.0 2023-11-29 09:25:39,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3905920.0, ans=0.0 2023-11-29 09:25:44,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3905920.0, ans=0.125 2023-11-29 09:25:51,256 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 8750, loss[loss=0.07322, simple_loss=0.1016, pruned_loss=0.01181, audio_tagging_loss=0.01063, over 16350.00 frames. ], tot_loss[loss=0.0649, simple_loss=0.08937, pruned_loss=0.01162, audio_tagging_loss=0.008592, over 3051033.31 frames. ], batch size: 60, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:25:51,367 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 585900 2023-11-29 09:25:53,738 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.97 vs. limit=15.0 2023-11-29 09:26:08,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3906053.3333333335, ans=0.0 2023-11-29 09:26:12,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3906053.3333333335, ans=0.1 2023-11-29 09:26:14,161 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3906120.0, ans=0.2 2023-11-29 09:26:16,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3906120.0, ans=0.125 2023-11-29 09:26:43,253 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.86 vs. limit=15.0 2023-11-29 09:26:43,268 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.85 vs. limit=15.0 2023-11-29 09:26:44,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3906253.3333333335, ans=0.0 2023-11-29 09:26:51,800 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 8800, loss[loss=0.06237, simple_loss=0.09012, pruned_loss=0.008047, audio_tagging_loss=0.009262, over 15148.00 frames. ], tot_loss[loss=0.06486, simple_loss=0.08914, pruned_loss=0.01157, audio_tagging_loss=0.008713, over 3056732.76 frames. ], batch size: 54, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:26:51,975 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 585950 2023-11-29 09:26:57,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3906320.0, ans=0.0 2023-11-29 09:26:58,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3906320.0, ans=0.09899494936611666 2023-11-29 09:27:19,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3906453.3333333335, ans=0.125 2023-11-29 09:27:20,434 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3906453.3333333335, ans=0.2 2023-11-29 09:27:20,467 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3906453.3333333335, ans=0.2 2023-11-29 09:27:23,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3906453.3333333335, ans=0.0 2023-11-29 09:27:39,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3906586.6666666665, ans=0.125 2023-11-29 09:27:40,493 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.520e+01 9.118e+01 9.794e+01 1.059e+02 1.251e+02, threshold=1.959e+02, percent-clipped=0.0 2023-11-29 09:27:52,777 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 8850, loss[loss=0.05929, simple_loss=0.08297, pruned_loss=0.008182, audio_tagging_loss=0.009618, over 15795.00 frames. ], tot_loss[loss=0.06496, simple_loss=0.08947, pruned_loss=0.01158, audio_tagging_loss=0.00865, over 3047567.98 frames. ], batch size: 60, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:27:52,880 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 586000 2023-11-29 09:27:53,351 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.27 vs. limit=10.0 2023-11-29 09:28:05,781 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 09:28:30,905 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3906853.3333333335, ans=0.1 2023-11-29 09:28:53,421 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 8900, loss[loss=0.07007, simple_loss=0.08972, pruned_loss=0.01782, audio_tagging_loss=0.007392, over 14326.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.08987, pruned_loss=0.01161, audio_tagging_loss=0.008571, over 3048219.27 frames. ], batch size: 54, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:28:53,567 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 586050 2023-11-29 09:29:07,577 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.29 vs. limit=15.0 2023-11-29 09:29:24,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3907120.0, ans=0.125 2023-11-29 09:29:29,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3907186.6666666665, ans=0.2 2023-11-29 09:29:37,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3907186.6666666665, ans=0.125 2023-11-29 09:29:42,541 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.846e+01 9.012e+01 9.649e+01 1.060e+02 1.281e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-29 09:29:44,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3907253.3333333335, ans=0.0 2023-11-29 09:29:55,274 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 8950, loss[loss=0.06474, simple_loss=0.09169, pruned_loss=0.01091, audio_tagging_loss=0.007984, over 15640.00 frames. ], tot_loss[loss=0.06496, simple_loss=0.09, pruned_loss=0.01156, audio_tagging_loss=0.008407, over 3053116.30 frames. ], batch size: 60, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:29:55,363 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 586100 2023-11-29 09:30:01,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=3907320.0, ans=6.0 2023-11-29 09:30:03,742 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.21 vs. limit=22.5 2023-11-29 09:30:13,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3907386.6666666665, ans=0.0 2023-11-29 09:30:14,664 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3907386.6666666665, ans=0.2 2023-11-29 09:30:15,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3907386.6666666665, ans=0.1 2023-11-29 09:30:32,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3907520.0, ans=0.125 2023-11-29 09:30:32,646 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.53 vs. limit=22.5 2023-11-29 09:30:48,101 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3907586.6666666665, ans=0.125 2023-11-29 09:30:48,172 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3907586.6666666665, ans=0.2 2023-11-29 09:30:48,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3907586.6666666665, ans=0.0 2023-11-29 09:30:49,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3907586.6666666665, ans=0.0 2023-11-29 09:30:56,197 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 9000, loss[loss=0.06099, simple_loss=0.07924, pruned_loss=0.008773, audio_tagging_loss=0.0126, over 14642.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.09035, pruned_loss=0.01173, audio_tagging_loss=0.008455, over 3046797.77 frames. ], batch size: 55, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:30:56,198 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-29 09:31:16,099 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.4259, 3.3459, 3.7877, 3.5059], device='cuda:1') 2023-11-29 09:31:19,618 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([3.4794, 2.8872, 2.3959, 2.9753, 2.9152, 2.8009, 2.9004, 2.6900], device='cuda:1') 2023-11-29 09:31:35,999 INFO [train_asr.py:1267] (1/4) Epoch 49, validation: loss=0.05863, simple_loss=0.05047, pruned_loss=0.00547, audio_tagging_loss=0.02792, over 4681554.00 frames. 2023-11-29 09:31:36,000 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-29 09:31:36,106 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 586150 2023-11-29 09:31:49,477 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.33 vs. limit=15.0 2023-11-29 09:32:02,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3907786.6666666665, ans=0.125 2023-11-29 09:32:09,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3907786.6666666665, ans=0.09899494936611666 2023-11-29 09:32:14,097 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3907853.3333333335, ans=0.125 2023-11-29 09:32:21,066 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3907853.3333333335, ans=0.1 2023-11-29 09:32:25,438 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.432e+01 9.291e+01 1.003e+02 1.087e+02 1.506e+02, threshold=2.006e+02, percent-clipped=0.0 2023-11-29 09:32:37,786 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 9050, loss[loss=0.07175, simple_loss=0.1037, pruned_loss=0.01326, audio_tagging_loss=0.006643, over 15230.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.09062, pruned_loss=0.01174, audio_tagging_loss=0.008405, over 3048938.58 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:32:37,898 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 586200 2023-11-29 09:32:51,402 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.29 vs. limit=5.0 2023-11-29 09:33:07,226 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3908120.0, ans=0.0 2023-11-29 09:33:39,631 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 9100, loss[loss=0.029, simple_loss=0.03326, pruned_loss=0.001103, audio_tagging_loss=0.01126, over 15885.00 frames. ], tot_loss[loss=0.06472, simple_loss=0.08926, pruned_loss=0.01163, audio_tagging_loss=0.00847, over 3050529.32 frames. ], batch size: 62, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:33:39,706 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 586250 2023-11-29 09:33:43,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3908320.0, ans=0.2 2023-11-29 09:33:54,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3908386.6666666665, ans=0.125 2023-11-29 09:34:11,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.whiten.whitening_limit, batch_count=3908453.3333333335, ans=12.0 2023-11-29 09:34:19,729 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.81 vs. limit=12.0 2023-11-29 09:34:30,306 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.316e+01 9.126e+01 9.745e+01 1.074e+02 1.723e+02, threshold=1.949e+02, percent-clipped=0.0 2023-11-29 09:34:37,040 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.58 vs. limit=15.0 2023-11-29 09:34:37,927 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3908586.6666666665, ans=0.0 2023-11-29 09:34:41,031 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 9150, loss[loss=0.07127, simple_loss=0.09803, pruned_loss=0.01531, audio_tagging_loss=0.006939, over 15083.00 frames. ], tot_loss[loss=0.06481, simple_loss=0.08951, pruned_loss=0.01172, audio_tagging_loss=0.008331, over 3050991.35 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:34:41,120 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 586300 2023-11-29 09:35:28,037 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.62 vs. limit=12.0 2023-11-29 09:35:44,162 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 9200, loss[loss=0.07228, simple_loss=0.106, pruned_loss=0.01479, audio_tagging_loss=0.004496, over 16242.00 frames. ], tot_loss[loss=0.06428, simple_loss=0.08881, pruned_loss=0.01159, audio_tagging_loss=0.008281, over 3054729.78 frames. ], batch size: 60, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:35:44,249 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 586350 2023-11-29 09:36:13,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3909120.0, ans=0.025 2023-11-29 09:36:34,389 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.502e+01 8.956e+01 9.492e+01 1.042e+02 1.619e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-29 09:36:45,681 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 9250, loss[loss=0.05956, simple_loss=0.07436, pruned_loss=0.009247, audio_tagging_loss=0.01314, over 15493.00 frames. ], tot_loss[loss=0.06497, simple_loss=0.08981, pruned_loss=0.01181, audio_tagging_loss=0.008252, over 3062563.80 frames. ], batch size: 60, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:36:45,791 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 586400 2023-11-29 09:36:55,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3909320.0, ans=0.2 2023-11-29 09:36:58,726 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3909386.6666666665, ans=0.125 2023-11-29 09:37:03,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3909386.6666666665, ans=0.2 2023-11-29 09:37:07,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3909386.6666666665, ans=0.025 2023-11-29 09:37:25,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3909520.0, ans=0.1 2023-11-29 09:37:27,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3909520.0, ans=0.125 2023-11-29 09:37:34,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3909586.6666666665, ans=0.1 2023-11-29 09:37:38,631 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.19 vs. limit=15.0 2023-11-29 09:37:41,662 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 09:37:47,399 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 9300, loss[loss=0.07873, simple_loss=0.1174, pruned_loss=0.01182, audio_tagging_loss=0.008197, over 15607.00 frames. ], tot_loss[loss=0.0648, simple_loss=0.08954, pruned_loss=0.01174, audio_tagging_loss=0.008287, over 3067546.68 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:37:47,520 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 586450 2023-11-29 09:38:07,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3909720.0, ans=0.125 2023-11-29 09:38:27,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3909853.3333333335, ans=0.125 2023-11-29 09:38:28,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3909853.3333333335, ans=0.2 2023-11-29 09:38:38,308 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.899e+01 9.042e+01 9.889e+01 1.074e+02 1.345e+02, threshold=1.978e+02, percent-clipped=0.0 2023-11-29 09:38:40,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3909920.0, ans=0.2 2023-11-29 09:38:43,788 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3909920.0, ans=0.125 2023-11-29 09:38:49,394 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 9350, loss[loss=0.05992, simple_loss=0.07715, pruned_loss=0.01038, audio_tagging_loss=0.01097, over 15835.00 frames. ], tot_loss[loss=0.06461, simple_loss=0.08905, pruned_loss=0.01175, audio_tagging_loss=0.008329, over 3059955.41 frames. ], batch size: 59, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:38:49,478 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 586500 2023-11-29 09:39:03,317 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.57 vs. limit=10.0 2023-11-29 09:39:05,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3910053.3333333335, ans=0.2 2023-11-29 09:39:35,185 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3910186.6666666665, ans=0.0 2023-11-29 09:39:42,541 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3910253.3333333335, ans=0.125 2023-11-29 09:39:47,309 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3910253.3333333335, ans=0.125 2023-11-29 09:39:51,653 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 9400, loss[loss=0.08086, simple_loss=0.1047, pruned_loss=0.01759, audio_tagging_loss=0.01089, over 15167.00 frames. ], tot_loss[loss=0.06466, simple_loss=0.08913, pruned_loss=0.01167, audio_tagging_loss=0.008427, over 3057227.32 frames. ], batch size: 58, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:39:51,737 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 586550 2023-11-29 09:39:58,241 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3910320.0, ans=0.125 2023-11-29 09:40:11,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3910386.6666666665, ans=0.2 2023-11-29 09:40:22,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3910453.3333333335, ans=0.125 2023-11-29 09:40:36,533 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3910520.0, ans=0.125 2023-11-29 09:40:42,640 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.237e+01 8.989e+01 9.545e+01 1.042e+02 1.282e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-29 09:40:49,692 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.81 vs. limit=10.0 2023-11-29 09:40:52,761 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 09:40:53,895 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 9450, loss[loss=0.06897, simple_loss=0.09884, pruned_loss=0.01097, audio_tagging_loss=0.008577, over 13874.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.09034, pruned_loss=0.01194, audio_tagging_loss=0.008453, over 3055963.83 frames. ], batch size: 55, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:40:54,004 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 586600 2023-11-29 09:41:08,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3910720.0, ans=0.125 2023-11-29 09:41:11,815 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.99 vs. limit=15.0 2023-11-29 09:41:13,065 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.38 vs. limit=10.0 2023-11-29 09:41:23,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3910786.6666666665, ans=0.2 2023-11-29 09:41:33,850 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 09:41:43,290 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3910920.0, ans=0.0 2023-11-29 09:41:48,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3910920.0, ans=0.1 2023-11-29 09:41:55,137 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 9500, loss[loss=0.06915, simple_loss=0.09513, pruned_loss=0.01433, audio_tagging_loss=0.007259, over 16120.00 frames. ], tot_loss[loss=0.06492, simple_loss=0.08944, pruned_loss=0.01176, audio_tagging_loss=0.008447, over 3060927.45 frames. ], batch size: 59, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:41:55,244 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 586650 2023-11-29 09:42:03,704 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3910986.6666666665, ans=0.125 2023-11-29 09:42:06,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3911053.3333333335, ans=0.0 2023-11-29 09:42:21,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3911120.0, ans=0.1 2023-11-29 09:42:21,669 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.90 vs. limit=10.0 2023-11-29 09:42:27,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3911120.0, ans=0.1 2023-11-29 09:42:29,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3911120.0, ans=0.125 2023-11-29 09:42:39,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3911186.6666666665, ans=0.125 2023-11-29 09:42:43,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3911253.3333333335, ans=0.125 2023-11-29 09:42:45,154 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.304e+01 8.983e+01 9.496e+01 1.015e+02 1.388e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-29 09:42:46,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3911253.3333333335, ans=0.09899494936611666 2023-11-29 09:42:51,401 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3911253.3333333335, ans=0.0 2023-11-29 09:42:52,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3911253.3333333335, ans=10.0 2023-11-29 09:42:55,718 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 9550, loss[loss=0.05598, simple_loss=0.08009, pruned_loss=0.009937, audio_tagging_loss=0.005996, over 14508.00 frames. ], tot_loss[loss=0.06449, simple_loss=0.0885, pruned_loss=0.01168, audio_tagging_loss=0.008557, over 3055668.14 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:42:55,807 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 586700 2023-11-29 09:43:01,111 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.55 vs. limit=22.5 2023-11-29 09:43:03,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3911320.0, ans=0.125 2023-11-29 09:43:27,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3911453.3333333335, ans=0.2 2023-11-29 09:43:30,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3911453.3333333335, ans=0.0 2023-11-29 09:43:36,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3911520.0, ans=0.125 2023-11-29 09:43:45,592 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3911586.6666666665, ans=0.0 2023-11-29 09:43:58,371 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 9600, loss[loss=0.04996, simple_loss=0.06548, pruned_loss=0.008582, audio_tagging_loss=0.008645, over 14535.00 frames. ], tot_loss[loss=0.0644, simple_loss=0.0882, pruned_loss=0.01169, audio_tagging_loss=0.008619, over 3041819.43 frames. ], batch size: 56, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:43:58,500 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 586750 2023-11-29 09:43:59,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3911653.3333333335, ans=0.125 2023-11-29 09:44:04,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3911653.3333333335, ans=0.5 2023-11-29 09:44:05,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3911653.3333333335, ans=0.2 2023-11-29 09:44:22,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3911786.6666666665, ans=0.125 2023-11-29 09:44:46,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3911920.0, ans=0.1 2023-11-29 09:44:49,674 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.510e+01 9.041e+01 9.598e+01 1.061e+02 1.358e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-29 09:45:00,437 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 9650, loss[loss=0.06026, simple_loss=0.07235, pruned_loss=0.01254, audio_tagging_loss=0.01155, over 14157.00 frames. ], tot_loss[loss=0.06458, simple_loss=0.08822, pruned_loss=0.01188, audio_tagging_loss=0.008594, over 3037580.67 frames. ], batch size: 55, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:45:00,555 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 586800 2023-11-29 09:45:05,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3911986.6666666665, ans=0.125 2023-11-29 09:45:13,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3912053.3333333335, ans=0.125 2023-11-29 09:45:36,271 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3912186.6666666665, ans=0.025 2023-11-29 09:46:00,806 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3912320.0, ans=0.0 2023-11-29 09:46:01,548 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 9700, loss[loss=0.0582, simple_loss=0.07737, pruned_loss=0.01186, audio_tagging_loss=0.00765, over 16564.00 frames. ], tot_loss[loss=0.06433, simple_loss=0.08807, pruned_loss=0.01183, audio_tagging_loss=0.008466, over 3033422.97 frames. ], batch size: 61, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:46:01,672 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 586850 2023-11-29 09:46:06,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3912320.0, ans=0.0 2023-11-29 09:46:13,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3912386.6666666665, ans=0.125 2023-11-29 09:46:21,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3912386.6666666665, ans=0.125 2023-11-29 09:46:27,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3912453.3333333335, ans=0.1 2023-11-29 09:46:51,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3912586.6666666665, ans=0.125 2023-11-29 09:46:52,741 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.44 vs. limit=12.0 2023-11-29 09:46:53,254 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.721e+01 9.022e+01 9.629e+01 1.062e+02 1.416e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-29 09:46:57,057 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3912586.6666666665, ans=0.125 2023-11-29 09:47:03,540 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 9750, loss[loss=0.05953, simple_loss=0.08476, pruned_loss=0.009161, audio_tagging_loss=0.007989, over 15763.00 frames. ], tot_loss[loss=0.06432, simple_loss=0.08823, pruned_loss=0.01183, audio_tagging_loss=0.008369, over 3036904.17 frames. ], batch size: 60, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:47:03,627 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 586900 2023-11-29 09:47:05,393 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.05 vs. limit=22.5 2023-11-29 09:47:50,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3912853.3333333335, ans=0.125 2023-11-29 09:48:07,022 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 9800, loss[loss=0.07063, simple_loss=0.09058, pruned_loss=0.01697, audio_tagging_loss=0.008374, over 14173.00 frames. ], tot_loss[loss=0.06437, simple_loss=0.08825, pruned_loss=0.01189, audio_tagging_loss=0.008356, over 3033387.90 frames. ], batch size: 53, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:48:07,119 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 586950 2023-11-29 09:48:07,751 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.86 vs. limit=10.0 2023-11-29 09:48:08,716 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.08 vs. limit=15.0 2023-11-29 09:48:11,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3912986.6666666665, ans=0.0 2023-11-29 09:48:24,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3913053.3333333335, ans=0.2 2023-11-29 09:48:34,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3913120.0, ans=0.0 2023-11-29 09:48:55,767 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.50 vs. limit=15.0 2023-11-29 09:48:59,763 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.885e+01 9.275e+01 1.009e+02 1.063e+02 1.388e+02, threshold=2.019e+02, percent-clipped=0.0 2023-11-29 09:49:02,208 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 09:49:02,985 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.13 vs. limit=15.0 2023-11-29 09:49:08,181 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 9850, loss[loss=0.05845, simple_loss=0.08014, pruned_loss=0.009774, audio_tagging_loss=0.008604, over 15409.00 frames. ], tot_loss[loss=0.065, simple_loss=0.08937, pruned_loss=0.01199, audio_tagging_loss=0.008322, over 3047545.21 frames. ], batch size: 62, lr: 1.37e-03, grad_scale: 8.0 2023-11-29 09:49:08,265 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 587000 2023-11-29 09:50:04,504 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3913586.6666666665, ans=0.0 2023-11-29 09:50:05,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3913586.6666666665, ans=0.125 2023-11-29 09:50:10,934 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 9900, loss[loss=0.05792, simple_loss=0.07068, pruned_loss=0.0135, audio_tagging_loss=0.009081, over 14326.00 frames. ], tot_loss[loss=0.0658, simple_loss=0.09057, pruned_loss=0.01228, audio_tagging_loss=0.00823, over 3051379.34 frames. ], batch size: 55, lr: 1.37e-03, grad_scale: 8.0 2023-11-29 09:50:11,010 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 587050 2023-11-29 09:50:13,562 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3913653.3333333335, ans=0.125 2023-11-29 09:50:44,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3913786.6666666665, ans=0.125 2023-11-29 09:50:50,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3913853.3333333335, ans=0.125 2023-11-29 09:50:54,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3913853.3333333335, ans=0.2 2023-11-29 09:50:58,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3913853.3333333335, ans=0.0 2023-11-29 09:51:03,912 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.560e+01 9.220e+01 9.633e+01 1.039e+02 1.360e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-29 09:51:05,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3913920.0, ans=0.2 2023-11-29 09:51:11,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3913986.6666666665, ans=0.125 2023-11-29 09:51:12,769 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 9950, loss[loss=0.06636, simple_loss=0.08612, pruned_loss=0.01399, audio_tagging_loss=0.009312, over 14140.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.09053, pruned_loss=0.01215, audio_tagging_loss=0.008195, over 3050769.75 frames. ], batch size: 54, lr: 1.37e-03, grad_scale: 8.0 2023-11-29 09:51:12,877 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 587100 2023-11-29 09:51:29,203 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.36 vs. limit=15.0 2023-11-29 09:51:37,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3914120.0, ans=0.1 2023-11-29 09:51:44,694 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3914120.0, ans=0.125 2023-11-29 09:51:54,237 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.78 vs. limit=15.0 2023-11-29 09:51:56,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3914186.6666666665, ans=0.0 2023-11-29 09:51:59,210 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 09:52:14,042 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 10000, loss[loss=0.07961, simple_loss=0.1101, pruned_loss=0.01816, audio_tagging_loss=0.00638, over 15528.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.09062, pruned_loss=0.01214, audio_tagging_loss=0.008242, over 3048017.49 frames. ], batch size: 56, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:52:14,176 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 587150 2023-11-29 09:52:26,107 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3914386.6666666665, ans=0.125 2023-11-29 09:52:26,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3914386.6666666665, ans=0.125 2023-11-29 09:53:07,085 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.201e+01 9.076e+01 9.668e+01 1.049e+02 3.214e+02, threshold=1.934e+02, percent-clipped=1.0 2023-11-29 09:53:15,319 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 10050, loss[loss=0.06617, simple_loss=0.07852, pruned_loss=0.01846, audio_tagging_loss=0.008452, over 14392.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.09049, pruned_loss=0.01227, audio_tagging_loss=0.008326, over 3039013.25 frames. ], batch size: 55, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:53:15,426 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 587200 2023-11-29 09:53:25,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3914653.3333333335, ans=0.0 2023-11-29 09:54:16,936 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 10100, loss[loss=0.07179, simple_loss=0.09744, pruned_loss=0.01507, audio_tagging_loss=0.008004, over 14617.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08987, pruned_loss=0.012, audio_tagging_loss=0.008421, over 3041246.83 frames. ], batch size: 55, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:54:17,021 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 587250 2023-11-29 09:54:25,330 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3914986.6666666665, ans=0.125 2023-11-29 09:54:42,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3915120.0, ans=0.125 2023-11-29 09:54:46,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3915120.0, ans=0.1 2023-11-29 09:54:49,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3915120.0, ans=0.125 2023-11-29 09:54:51,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3915120.0, ans=0.1 2023-11-29 09:55:06,677 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 09:55:10,595 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.962e+01 9.039e+01 9.666e+01 1.026e+02 1.279e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-29 09:55:19,030 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.15 vs. limit=15.0 2023-11-29 09:55:19,730 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 10150, loss[loss=0.08492, simple_loss=0.1252, pruned_loss=0.01376, audio_tagging_loss=0.008537, over 15490.00 frames. ], tot_loss[loss=0.06528, simple_loss=0.08964, pruned_loss=0.01196, audio_tagging_loss=0.008506, over 3040146.43 frames. ], batch size: 56, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:55:19,814 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 587300 2023-11-29 09:55:35,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3915386.6666666665, ans=0.0 2023-11-29 09:55:39,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3915386.6666666665, ans=0.125 2023-11-29 09:55:48,763 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 09:55:50,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3915453.3333333335, ans=0.0 2023-11-29 09:56:15,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3915586.6666666665, ans=0.125 2023-11-29 09:56:20,714 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 10200, loss[loss=0.05988, simple_loss=0.08837, pruned_loss=0.009938, audio_tagging_loss=0.00575, over 15014.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.09022, pruned_loss=0.01202, audio_tagging_loss=0.008422, over 3055029.39 frames. ], batch size: 56, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:56:20,816 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 587350 2023-11-29 09:56:44,845 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 09:56:45,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3915786.6666666665, ans=0.125 2023-11-29 09:56:48,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3915786.6666666665, ans=0.0 2023-11-29 09:56:49,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3915786.6666666665, ans=0.2 2023-11-29 09:57:00,727 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3915853.3333333335, ans=0.0 2023-11-29 09:57:12,614 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.43 vs. limit=15.0 2023-11-29 09:57:14,098 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.735e+01 8.990e+01 9.483e+01 1.035e+02 1.443e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-29 09:57:22,299 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 10250, loss[loss=0.06608, simple_loss=0.08543, pruned_loss=0.01268, audio_tagging_loss=0.01069, over 14388.00 frames. ], tot_loss[loss=0.06498, simple_loss=0.08915, pruned_loss=0.01191, audio_tagging_loss=0.008495, over 3058355.29 frames. ], batch size: 54, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:57:22,433 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 587400 2023-11-29 09:57:32,787 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.18 vs. limit=6.0 2023-11-29 09:57:40,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3916053.3333333335, ans=0.0 2023-11-29 09:57:46,017 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 09:57:56,775 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3916120.0, ans=0.125 2023-11-29 09:57:59,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3916186.6666666665, ans=0.125 2023-11-29 09:58:25,267 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 10300, loss[loss=0.06151, simple_loss=0.08856, pruned_loss=0.008652, audio_tagging_loss=0.008577, over 15786.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08977, pruned_loss=0.0121, audio_tagging_loss=0.008541, over 3057564.63 frames. ], batch size: 59, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:58:25,352 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 587450 2023-11-29 09:59:01,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3916520.0, ans=0.125 2023-11-29 09:59:10,360 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 09:59:18,316 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.769e+01 9.346e+01 9.831e+01 1.081e+02 1.349e+02, threshold=1.966e+02, percent-clipped=0.0 2023-11-29 09:59:27,077 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 10350, loss[loss=0.07274, simple_loss=0.1042, pruned_loss=0.01355, audio_tagging_loss=0.007109, over 15503.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.09097, pruned_loss=0.01208, audio_tagging_loss=0.008536, over 3059785.96 frames. ], batch size: 58, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:59:27,150 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 587500 2023-11-29 09:59:41,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3916720.0, ans=0.125 2023-11-29 09:59:44,529 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3916720.0, ans=0.1 2023-11-29 09:59:51,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3916786.6666666665, ans=0.1 2023-11-29 09:59:53,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3916786.6666666665, ans=0.1 2023-11-29 10:00:05,866 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.44 vs. limit=22.5 2023-11-29 10:00:08,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3916853.3333333335, ans=0.125 2023-11-29 10:00:28,917 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 10400, loss[loss=0.06877, simple_loss=0.09939, pruned_loss=0.01374, audio_tagging_loss=0.005335, over 15170.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.09017, pruned_loss=0.01192, audio_tagging_loss=0.008606, over 3054483.81 frames. ], batch size: 56, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 10:00:29,016 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 587550 2023-11-29 10:00:30,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3916986.6666666665, ans=0.0 2023-11-29 10:00:30,841 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.29 vs. limit=15.0 2023-11-29 10:00:32,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3916986.6666666665, ans=0.1 2023-11-29 10:00:42,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3917053.3333333335, ans=0.125 2023-11-29 10:00:52,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3917120.0, ans=0.1 2023-11-29 10:00:55,278 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.61 vs. limit=22.5 2023-11-29 10:01:06,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3917186.6666666665, ans=0.05 2023-11-29 10:01:22,239 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.537e+01 9.130e+01 9.665e+01 1.040e+02 1.258e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-29 10:01:31,599 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 10450, loss[loss=0.05697, simple_loss=0.06724, pruned_loss=0.01244, audio_tagging_loss=0.01092, over 14308.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.09018, pruned_loss=0.01192, audio_tagging_loss=0.008624, over 3051681.89 frames. ], batch size: 55, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 10:01:31,718 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 587600 2023-11-29 10:01:44,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3917386.6666666665, ans=0.125 2023-11-29 10:01:46,674 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3917386.6666666665, ans=0.0 2023-11-29 10:01:53,158 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3917386.6666666665, ans=0.125 2023-11-29 10:01:53,732 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.20 vs. limit=6.0 2023-11-29 10:02:16,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.whiten.whitening_limit, batch_count=3917520.0, ans=15.0 2023-11-29 10:02:33,197 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 10500, loss[loss=0.06782, simple_loss=0.09077, pruned_loss=0.01387, audio_tagging_loss=0.008562, over 14806.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.08986, pruned_loss=0.01179, audio_tagging_loss=0.00849, over 3045810.53 frames. ], batch size: 55, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 10:02:33,381 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 587650 2023-11-29 10:02:34,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3917653.3333333335, ans=0.0 2023-11-29 10:02:38,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3917653.3333333335, ans=0.0 2023-11-29 10:02:45,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3917720.0, ans=0.0 2023-11-29 10:02:45,880 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.52 vs. limit=15.0 2023-11-29 10:03:10,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3917853.3333333335, ans=0.2 2023-11-29 10:03:20,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3917853.3333333335, ans=0.5 2023-11-29 10:03:26,872 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.449e+01 8.916e+01 9.691e+01 1.026e+02 1.437e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-29 10:03:27,178 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3917920.0, ans=0.125 2023-11-29 10:03:28,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3917920.0, ans=0.125 2023-11-29 10:03:32,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3917920.0, ans=0.125 2023-11-29 10:03:35,129 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.77 vs. limit=15.0 2023-11-29 10:03:35,736 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 10550, loss[loss=0.05754, simple_loss=0.07362, pruned_loss=0.009532, audio_tagging_loss=0.0112, over 15565.00 frames. ], tot_loss[loss=0.06493, simple_loss=0.08989, pruned_loss=0.01154, audio_tagging_loss=0.008438, over 3049012.14 frames. ], batch size: 60, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 10:03:35,841 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 587700 2023-11-29 10:04:28,825 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.37 vs. limit=6.0 2023-11-29 10:04:31,124 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.78 vs. limit=10.0 2023-11-29 10:04:32,368 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3918253.3333333335, ans=0.0 2023-11-29 10:04:36,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3918253.3333333335, ans=0.125 2023-11-29 10:04:38,655 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 10600, loss[loss=0.0767, simple_loss=0.1026, pruned_loss=0.01753, audio_tagging_loss=0.007847, over 16311.00 frames. ], tot_loss[loss=0.06487, simple_loss=0.08946, pruned_loss=0.01175, audio_tagging_loss=0.008394, over 3043304.49 frames. ], batch size: 61, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 10:04:38,756 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 587750 2023-11-29 10:05:20,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3918520.0, ans=0.2 2023-11-29 10:05:31,810 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.005e+01 8.953e+01 9.680e+01 1.038e+02 1.330e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-29 10:05:32,668 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.77 vs. limit=15.0 2023-11-29 10:05:40,134 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 10650, loss[loss=0.08051, simple_loss=0.1133, pruned_loss=0.01563, audio_tagging_loss=0.008243, over 16093.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.09006, pruned_loss=0.01187, audio_tagging_loss=0.008337, over 3041869.54 frames. ], batch size: 59, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 10:05:40,240 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 587800 2023-11-29 10:05:45,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3918653.3333333335, ans=0.0 2023-11-29 10:06:01,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3918720.0, ans=0.125 2023-11-29 10:06:15,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3918786.6666666665, ans=0.125 2023-11-29 10:06:32,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3918920.0, ans=0.2 2023-11-29 10:06:42,313 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 10700, loss[loss=0.07216, simple_loss=0.1063, pruned_loss=0.01233, audio_tagging_loss=0.00666, over 14581.00 frames. ], tot_loss[loss=0.06458, simple_loss=0.08922, pruned_loss=0.01154, audio_tagging_loss=0.008424, over 3046531.95 frames. ], batch size: 54, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 10:06:42,397 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 587850 2023-11-29 10:06:43,888 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.28 vs. limit=22.5 2023-11-29 10:06:47,351 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3918986.6666666665, ans=0.0 2023-11-29 10:06:48,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3918986.6666666665, ans=0.125 2023-11-29 10:06:49,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3918986.6666666665, ans=0.125 2023-11-29 10:06:56,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3919053.3333333335, ans=0.125 2023-11-29 10:07:07,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3919120.0, ans=0.2 2023-11-29 10:07:20,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3919186.6666666665, ans=0.0 2023-11-29 10:07:22,763 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.13 vs. limit=12.0 2023-11-29 10:07:35,204 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3919253.3333333335, ans=0.125 2023-11-29 10:07:36,073 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.832e+01 9.064e+01 9.584e+01 1.018e+02 1.338e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-29 10:07:37,692 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3919253.3333333335, ans=0.0 2023-11-29 10:07:44,306 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 10750, loss[loss=0.07478, simple_loss=0.1046, pruned_loss=0.01327, audio_tagging_loss=0.009204, over 16314.00 frames. ], tot_loss[loss=0.06432, simple_loss=0.08866, pruned_loss=0.01159, audio_tagging_loss=0.008397, over 3043694.55 frames. ], batch size: 58, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 10:07:44,427 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 587900 2023-11-29 10:07:44,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3919320.0, ans=0.125 2023-11-29 10:07:49,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3919320.0, ans=0.125 2023-11-29 10:08:14,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3919453.3333333335, ans=10.0 2023-11-29 10:08:25,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3919520.0, ans=0.2 2023-11-29 10:08:39,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3919586.6666666665, ans=0.1 2023-11-29 10:08:44,912 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 10800, loss[loss=0.05517, simple_loss=0.07329, pruned_loss=0.0111, audio_tagging_loss=0.007432, over 16280.00 frames. ], tot_loss[loss=0.06395, simple_loss=0.088, pruned_loss=0.01149, audio_tagging_loss=0.008458, over 3042482.18 frames. ], batch size: 64, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 10:08:44,998 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 587950 2023-11-29 10:08:54,973 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.03 vs. limit=22.5 2023-11-29 10:09:39,273 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.936e+01 9.085e+01 9.620e+01 1.015e+02 1.229e+02, threshold=1.924e+02, percent-clipped=0.0 2023-11-29 10:09:47,016 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 10850, loss[loss=0.06696, simple_loss=0.08861, pruned_loss=0.01223, audio_tagging_loss=0.01042, over 15465.00 frames. ], tot_loss[loss=0.06462, simple_loss=0.0887, pruned_loss=0.01183, audio_tagging_loss=0.008436, over 3041882.72 frames. ], batch size: 59, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 10:09:47,123 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 588000 2023-11-29 10:09:55,436 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3919986.6666666665, ans=0.125 2023-11-29 10:09:55,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3919986.6666666665, ans=0.125 2023-11-29 10:10:21,379 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3920120.0, ans=0.125 2023-11-29 10:10:27,246 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3920186.6666666665, ans=0.2 2023-11-29 10:10:32,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3920186.6666666665, ans=0.125 2023-11-29 10:10:37,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3920186.6666666665, ans=0.125 2023-11-29 10:10:39,774 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.69 vs. limit=6.0 2023-11-29 10:10:46,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3920253.3333333335, ans=0.125 2023-11-29 10:10:48,976 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 10:10:51,871 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 10900, loss[loss=0.07011, simple_loss=0.1016, pruned_loss=0.01336, audio_tagging_loss=0.005964, over 15010.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.08911, pruned_loss=0.01188, audio_tagging_loss=0.008478, over 3049662.37 frames. ], batch size: 56, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 10:10:51,988 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 588050 2023-11-29 10:11:17,480 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.80 vs. limit=15.0 2023-11-29 10:11:29,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3920520.0, ans=0.125 2023-11-29 10:11:46,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3920586.6666666665, ans=0.125 2023-11-29 10:11:47,359 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.916e+01 9.221e+01 9.905e+01 1.064e+02 1.744e+02, threshold=1.981e+02, percent-clipped=0.0 2023-11-29 10:11:53,347 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 10950, loss[loss=0.04601, simple_loss=0.05007, pruned_loss=0.01002, audio_tagging_loss=0.01095, over 16002.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.0896, pruned_loss=0.01196, audio_tagging_loss=0.008422, over 3055117.74 frames. ], batch size: 63, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 10:11:53,433 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 588100 2023-11-29 10:12:11,028 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3920720.0, ans=0.0 2023-11-29 10:12:11,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3920720.0, ans=0.125 2023-11-29 10:12:30,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3920853.3333333335, ans=0.2 2023-11-29 10:12:33,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3920853.3333333335, ans=0.2 2023-11-29 10:12:50,984 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.74 vs. limit=15.0 2023-11-29 10:12:54,879 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 11000, loss[loss=0.06688, simple_loss=0.09412, pruned_loss=0.0104, audio_tagging_loss=0.009419, over 14366.00 frames. ], tot_loss[loss=0.06502, simple_loss=0.08936, pruned_loss=0.01183, audio_tagging_loss=0.008508, over 3052962.84 frames. ], batch size: 55, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 10:12:55,021 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 588150 2023-11-29 10:13:02,209 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.76 vs. limit=15.0 2023-11-29 10:13:06,277 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 10:13:23,548 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3921120.0, ans=0.125 2023-11-29 10:13:50,730 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.176e+01 9.054e+01 9.771e+01 1.053e+02 1.277e+02, threshold=1.954e+02, percent-clipped=0.0 2023-11-29 10:13:56,543 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 11050, loss[loss=0.07312, simple_loss=0.09226, pruned_loss=0.01728, audio_tagging_loss=0.009707, over 14528.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08976, pruned_loss=0.01197, audio_tagging_loss=0.008596, over 3047117.76 frames. ], batch size: 55, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 10:13:56,625 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 588200 2023-11-29 10:14:04,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3921320.0, ans=0.125 2023-11-29 10:14:27,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3921453.3333333335, ans=0.05 2023-11-29 10:14:52,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3921586.6666666665, ans=0.0 2023-11-29 10:14:58,999 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 11100, loss[loss=0.06766, simple_loss=0.09449, pruned_loss=0.01328, audio_tagging_loss=0.007132, over 15539.00 frames. ], tot_loss[loss=0.06531, simple_loss=0.08918, pruned_loss=0.01203, audio_tagging_loss=0.008687, over 3049257.78 frames. ], batch size: 56, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 10:14:59,080 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 588250 2023-11-29 10:15:17,433 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3921720.0, ans=0.2 2023-11-29 10:15:35,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3921853.3333333335, ans=0.0 2023-11-29 10:15:53,232 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.915e+01 9.126e+01 9.732e+01 1.035e+02 1.401e+02, threshold=1.946e+02, percent-clipped=0.0 2023-11-29 10:15:55,809 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3921920.0, ans=0.125 2023-11-29 10:15:59,059 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 11150, loss[loss=0.06322, simple_loss=0.09669, pruned_loss=0.007005, audio_tagging_loss=0.007871, over 16542.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.0894, pruned_loss=0.01198, audio_tagging_loss=0.008746, over 3049132.35 frames. ], batch size: 63, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 10:15:59,158 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 588300 2023-11-29 10:16:31,414 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.97 vs. limit=15.0 2023-11-29 10:16:58,377 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3922253.3333333335, ans=0.0 2023-11-29 10:16:58,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3922253.3333333335, ans=0.0 2023-11-29 10:17:00,576 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 11200, loss[loss=0.05803, simple_loss=0.07655, pruned_loss=0.01203, audio_tagging_loss=0.007732, over 15361.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.0891, pruned_loss=0.01197, audio_tagging_loss=0.008836, over 3050253.46 frames. ], batch size: 60, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 10:17:00,655 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 588350 2023-11-29 10:17:03,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3922320.0, ans=0.04949747468305833 2023-11-29 10:17:42,429 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3922520.0, ans=0.2 2023-11-29 10:17:56,311 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.891e+01 9.269e+01 9.742e+01 1.059e+02 1.357e+02, threshold=1.948e+02, percent-clipped=0.0 2023-11-29 10:18:02,861 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 11250, loss[loss=0.06742, simple_loss=0.09559, pruned_loss=0.01232, audio_tagging_loss=0.007306, over 14691.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08935, pruned_loss=0.01199, audio_tagging_loss=0.008785, over 3051152.73 frames. ], batch size: 54, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 10:18:02,963 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 588400 2023-11-29 10:18:21,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3922720.0, ans=0.0 2023-11-29 10:18:56,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3922920.0, ans=0.0 2023-11-29 10:18:59,436 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3922920.0, ans=0.1 2023-11-29 10:19:00,813 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.06 vs. limit=15.0 2023-11-29 10:19:03,740 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 11300, loss[loss=0.05112, simple_loss=0.06764, pruned_loss=0.007998, audio_tagging_loss=0.009299, over 15449.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.08924, pruned_loss=0.01188, audio_tagging_loss=0.008675, over 3048345.58 frames. ], batch size: 61, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 10:19:03,874 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 588450 2023-11-29 10:19:14,122 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3922986.6666666665, ans=0.125 2023-11-29 10:19:36,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3923120.0, ans=0.1 2023-11-29 10:19:37,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3923120.0, ans=0.0 2023-11-29 10:19:44,297 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3923186.6666666665, ans=0.125 2023-11-29 10:19:58,513 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.051e+01 9.209e+01 9.911e+01 1.088e+02 1.767e+02, threshold=1.982e+02, percent-clipped=0.0 2023-11-29 10:20:04,427 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 11350, loss[loss=0.0513, simple_loss=0.06977, pruned_loss=0.006624, audio_tagging_loss=0.009794, over 13820.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.09034, pruned_loss=0.01202, audio_tagging_loss=0.008471, over 3045780.03 frames. ], batch size: 53, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 10:20:04,557 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 588500 2023-11-29 10:20:09,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_na.min_abs, batch_count=3923320.0, ans=0.02 2023-11-29 10:20:12,738 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.29 vs. limit=15.0 2023-11-29 10:20:35,937 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3923453.3333333335, ans=0.2 2023-11-29 10:20:39,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3923453.3333333335, ans=0.125 2023-11-29 10:20:56,259 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3923586.6666666665, ans=0.125 2023-11-29 10:21:06,492 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 11400, loss[loss=0.07257, simple_loss=0.0997, pruned_loss=0.01307, audio_tagging_loss=0.009653, over 15210.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.08974, pruned_loss=0.01195, audio_tagging_loss=0.008396, over 3047423.68 frames. ], batch size: 55, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 10:21:06,576 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 588550 2023-11-29 10:21:23,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3923720.0, ans=0.0 2023-11-29 10:21:25,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3923720.0, ans=0.125 2023-11-29 10:21:27,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3923720.0, ans=0.1 2023-11-29 10:21:33,509 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3923786.6666666665, ans=0.1 2023-11-29 10:21:47,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3923853.3333333335, ans=0.0 2023-11-29 10:21:57,215 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-29 10:22:01,950 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.509e+01 8.974e+01 9.585e+01 1.035e+02 2.029e+02, threshold=1.917e+02, percent-clipped=1.0 2023-11-29 10:22:04,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3923920.0, ans=0.125 2023-11-29 10:22:07,788 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 11450, loss[loss=0.05494, simple_loss=0.06561, pruned_loss=0.01156, audio_tagging_loss=0.01057, over 13562.00 frames. ], tot_loss[loss=0.06499, simple_loss=0.08927, pruned_loss=0.01192, audio_tagging_loss=0.008435, over 3043902.10 frames. ], batch size: 54, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 10:22:07,877 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 588600 2023-11-29 10:22:20,735 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3924053.3333333335, ans=0.0 2023-11-29 10:23:04,385 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.19 vs. limit=22.5 2023-11-29 10:23:09,770 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 11500, loss[loss=0.0717, simple_loss=0.09542, pruned_loss=0.01597, audio_tagging_loss=0.008023, over 14324.00 frames. ], tot_loss[loss=0.06467, simple_loss=0.08894, pruned_loss=0.01183, audio_tagging_loss=0.008374, over 3044678.21 frames. ], batch size: 53, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 10:23:09,873 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 588650 2023-11-29 10:23:14,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff3.min_abs, batch_count=3924320.0, ans=0.2 2023-11-29 10:23:44,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3924453.3333333335, ans=0.0 2023-11-29 10:24:05,149 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.99 vs. limit=22.5 2023-11-29 10:24:06,302 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.035e+01 8.997e+01 9.583e+01 1.037e+02 1.357e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-29 10:24:07,066 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.07 vs. limit=10.0 2023-11-29 10:24:07,846 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3924586.6666666665, ans=0.1 2023-11-29 10:24:11,593 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 11550, loss[loss=0.0526, simple_loss=0.07304, pruned_loss=0.007918, audio_tagging_loss=0.008166, over 14050.00 frames. ], tot_loss[loss=0.06478, simple_loss=0.08904, pruned_loss=0.0119, audio_tagging_loss=0.008357, over 3044571.77 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 10:24:11,686 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 588700 2023-11-29 10:24:12,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3924653.3333333335, ans=0.125 2023-11-29 10:24:21,758 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.02 vs. limit=22.5 2023-11-29 10:24:27,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3924720.0, ans=0.125 2023-11-29 10:24:34,505 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3924786.6666666665, ans=0.0 2023-11-29 10:24:41,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3924786.6666666665, ans=0.2 2023-11-29 10:24:49,084 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 10:24:53,628 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3924853.3333333335, ans=0.125 2023-11-29 10:25:11,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3924986.6666666665, ans=0.125 2023-11-29 10:25:12,267 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 11600, loss[loss=0.08891, simple_loss=0.129, pruned_loss=0.01747, audio_tagging_loss=0.006934, over 15098.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.09011, pruned_loss=0.01202, audio_tagging_loss=0.008369, over 3044424.62 frames. ], batch size: 56, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 10:25:12,345 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 588750 2023-11-29 10:25:31,383 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3925053.3333333335, ans=0.025 2023-11-29 10:25:32,365 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3925053.3333333335, ans=0.1 2023-11-29 10:25:34,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3925053.3333333335, ans=0.0 2023-11-29 10:26:09,245 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.110e+01 9.153e+01 9.919e+01 1.070e+02 2.477e+02, threshold=1.984e+02, percent-clipped=1.0 2023-11-29 10:26:14,604 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 11650, loss[loss=0.07558, simple_loss=0.1135, pruned_loss=0.01197, audio_tagging_loss=0.006853, over 15338.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.09025, pruned_loss=0.01205, audio_tagging_loss=0.008416, over 3042012.00 frames. ], batch size: 56, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 10:26:14,685 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 588800 2023-11-29 10:26:15,088 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.10 vs. limit=6.0 2023-11-29 10:26:29,336 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.44 vs. limit=15.0 2023-11-29 10:26:32,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3925386.6666666665, ans=0.0 2023-11-29 10:27:04,368 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3925586.6666666665, ans=0.0 2023-11-29 10:27:05,903 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.94 vs. limit=15.0 2023-11-29 10:27:12,756 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.58 vs. limit=12.0 2023-11-29 10:27:17,042 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 11700, loss[loss=0.07225, simple_loss=0.0958, pruned_loss=0.01281, audio_tagging_loss=0.01154, over 15922.00 frames. ], tot_loss[loss=0.06494, simple_loss=0.0891, pruned_loss=0.01189, audio_tagging_loss=0.008508, over 3037536.29 frames. ], batch size: 60, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 10:27:17,153 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 588850 2023-11-29 10:27:42,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3925786.6666666665, ans=0.0 2023-11-29 10:28:06,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3925920.0, ans=0.1 2023-11-29 10:28:12,505 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3925920.0, ans=0.025 2023-11-29 10:28:14,511 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.245e+01 9.126e+01 9.606e+01 1.033e+02 1.375e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-29 10:28:15,215 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.53 vs. limit=10.0 2023-11-29 10:28:17,921 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 11750, loss[loss=0.05704, simple_loss=0.07552, pruned_loss=0.0116, audio_tagging_loss=0.007685, over 15742.00 frames. ], tot_loss[loss=0.06458, simple_loss=0.08841, pruned_loss=0.01178, audio_tagging_loss=0.008587, over 3031937.05 frames. ], batch size: 59, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 10:28:18,002 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 588900 2023-11-29 10:28:31,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3926053.3333333335, ans=0.1 2023-11-29 10:28:34,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3926053.3333333335, ans=0.0 2023-11-29 10:28:49,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3926120.0, ans=0.125 2023-11-29 10:29:04,260 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3926186.6666666665, ans=0.05 2023-11-29 10:29:08,721 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 10:29:20,680 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 11800, loss[loss=0.06542, simple_loss=0.08724, pruned_loss=0.01092, audio_tagging_loss=0.01089, over 15449.00 frames. ], tot_loss[loss=0.06426, simple_loss=0.08799, pruned_loss=0.01161, audio_tagging_loss=0.008652, over 3039121.49 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 10:29:20,792 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 588950 2023-11-29 10:29:35,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3926386.6666666665, ans=0.025 2023-11-29 10:29:50,534 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3926453.3333333335, ans=0.125 2023-11-29 10:29:59,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3926520.0, ans=0.1 2023-11-29 10:30:18,174 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.681e+01 9.043e+01 9.603e+01 1.049e+02 1.292e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-29 10:30:20,088 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.19 vs. limit=15.0 2023-11-29 10:30:21,787 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 11850, loss[loss=0.06718, simple_loss=0.09978, pruned_loss=0.01043, audio_tagging_loss=0.006857, over 14918.00 frames. ], tot_loss[loss=0.0645, simple_loss=0.08825, pruned_loss=0.01173, audio_tagging_loss=0.008652, over 3040671.40 frames. ], batch size: 59, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 10:30:21,894 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 589000 2023-11-29 10:30:23,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3926653.3333333335, ans=0.125 2023-11-29 10:30:27,670 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff3.min_abs, batch_count=3926653.3333333335, ans=0.2 2023-11-29 10:30:39,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3926720.0, ans=0.125 2023-11-29 10:30:42,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3926720.0, ans=0.125 2023-11-29 10:30:56,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3926786.6666666665, ans=0.125 2023-11-29 10:30:59,528 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.53 vs. limit=22.5 2023-11-29 10:31:12,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3926920.0, ans=0.0 2023-11-29 10:31:22,952 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 11900, loss[loss=0.07351, simple_loss=0.0946, pruned_loss=0.01493, audio_tagging_loss=0.01128, over 15863.00 frames. ], tot_loss[loss=0.06477, simple_loss=0.08863, pruned_loss=0.01178, audio_tagging_loss=0.008667, over 3037889.62 frames. ], batch size: 58, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 10:31:23,068 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 589050 2023-11-29 10:31:48,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3927120.0, ans=0.125 2023-11-29 10:31:58,092 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3927186.6666666665, ans=0.1 2023-11-29 10:32:07,413 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3927186.6666666665, ans=0.0 2023-11-29 10:32:18,723 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.646e+01 9.048e+01 9.528e+01 1.019e+02 1.404e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-29 10:32:19,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3927253.3333333335, ans=0.1 2023-11-29 10:32:22,291 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 11950, loss[loss=0.05124, simple_loss=0.06403, pruned_loss=0.008686, audio_tagging_loss=0.01054, over 14064.00 frames. ], tot_loss[loss=0.06468, simple_loss=0.08819, pruned_loss=0.01182, audio_tagging_loss=0.008766, over 3038586.33 frames. ], batch size: 55, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 10:32:22,401 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 589100 2023-11-29 10:32:44,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3927386.6666666665, ans=0.0 2023-11-29 10:33:00,976 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.10 vs. limit=12.0 2023-11-29 10:33:13,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3927586.6666666665, ans=0.125 2023-11-29 10:33:19,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3927586.6666666665, ans=0.1 2023-11-29 10:33:21,418 INFO [train_asr.py:1235] (1/4) Epoch 49, batch 12000, loss[loss=0.0556, simple_loss=0.07717, pruned_loss=0.009317, audio_tagging_loss=0.007698, over 14941.00 frames. ], tot_loss[loss=0.06452, simple_loss=0.08768, pruned_loss=0.01175, audio_tagging_loss=0.008941, over 3038641.82 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 10:33:21,419 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-29 10:33:50,398 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.5078, 3.0051, 3.2028, 2.9516, 3.6132, 3.7733, 3.2491, 3.2151], device='cuda:1') 2023-11-29 10:34:01,191 INFO [train_asr.py:1267] (1/4) Epoch 49, validation: loss=0.0581, simple_loss=0.05045, pruned_loss=0.005444, audio_tagging_loss=0.02743, over 4681554.00 frames. 2023-11-29 10:34:01,192 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-29 10:34:01,243 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 589150 2023-11-29 10:34:11,215 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3927720.0, ans=0.0 2023-11-29 10:34:12,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3927720.0, ans=0.035 2023-11-29 10:34:46,427 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 0, loss[loss=0.08386, simple_loss=0.1012, pruned_loss=0.0154, audio_tagging_loss=0.01786, over 14816.00 frames. ], tot_loss[loss=0.08386, simple_loss=0.1012, pruned_loss=0.0154, audio_tagging_loss=0.01786, over 14816.00 frames. ], batch size: 56, lr: 1.36e-03, grad_scale: 32.0 2023-11-29 10:34:46,428 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-29 10:35:03,305 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.1347, 3.6982, 4.0603, 3.6803], device='cuda:1') 2023-11-29 10:35:22,077 INFO [train_asr.py:1267] (1/4) Epoch 50, validation: loss=0.05785, simple_loss=0.05049, pruned_loss=0.005519, audio_tagging_loss=0.02709, over 4681554.00 frames. 2023-11-29 10:35:22,078 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-29 10:35:28,847 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.78 vs. limit=22.5 2023-11-29 10:35:47,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3927940.0, ans=0.0 2023-11-29 10:35:54,862 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.763e+01 9.469e+01 1.029e+02 1.110e+02 1.447e+02, threshold=2.058e+02, percent-clipped=0.0 2023-11-29 10:35:57,238 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 589200 2023-11-29 10:35:57,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3927940.0, ans=0.09899494936611666 2023-11-29 10:35:58,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3927940.0, ans=0.125 2023-11-29 10:36:25,863 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 50, loss[loss=0.07208, simple_loss=0.08983, pruned_loss=0.01352, audio_tagging_loss=0.01364, over 13977.00 frames. ], tot_loss[loss=0.07314, simple_loss=0.08995, pruned_loss=0.01207, audio_tagging_loss=0.0161, over 685149.31 frames. ], batch size: 54, lr: 1.36e-03, grad_scale: 16.0 2023-11-29 10:36:50,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3928273.3333333335, ans=0.125 2023-11-29 10:37:00,031 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 589250 2023-11-29 10:37:13,158 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.31 vs. limit=10.0 2023-11-29 10:37:16,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3928406.6666666665, ans=0.0 2023-11-29 10:37:28,815 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3928473.3333333335, ans=0.125 2023-11-29 10:37:29,734 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 100, loss[loss=0.07075, simple_loss=0.08798, pruned_loss=0.01107, audio_tagging_loss=0.01569, over 15878.00 frames. ], tot_loss[loss=0.07141, simple_loss=0.08813, pruned_loss=0.01168, audio_tagging_loss=0.01567, over 1204552.21 frames. ], batch size: 61, lr: 1.36e-03, grad_scale: 16.0 2023-11-29 10:37:35,105 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.09 vs. limit=22.5 2023-11-29 10:37:39,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3928473.3333333335, ans=0.125 2023-11-29 10:37:41,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3928540.0, ans=0.0 2023-11-29 10:37:43,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3928540.0, ans=0.0 2023-11-29 10:37:59,653 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.92 vs. limit=15.0 2023-11-29 10:38:00,143 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 9.044e+01 1.010e+02 1.060e+02 1.133e+02 1.839e+02, threshold=2.120e+02, percent-clipped=0.0 2023-11-29 10:38:02,563 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 589300 2023-11-29 10:38:05,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3928673.3333333335, ans=0.125 2023-11-29 10:38:06,271 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3928673.3333333335, ans=0.0 2023-11-29 10:38:11,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3928673.3333333335, ans=0.0 2023-11-29 10:38:15,190 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.79 vs. limit=12.0 2023-11-29 10:38:21,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3928740.0, ans=0.2 2023-11-29 10:38:23,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3928740.0, ans=0.1 2023-11-29 10:38:25,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3928740.0, ans=0.125 2023-11-29 10:38:31,789 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 150, loss[loss=0.07682, simple_loss=0.1138, pruned_loss=0.01393, audio_tagging_loss=0.006007, over 15830.00 frames. ], tot_loss[loss=0.07067, simple_loss=0.08983, pruned_loss=0.01184, audio_tagging_loss=0.01392, over 1618447.23 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 10:38:52,073 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.02 vs. limit=6.0 2023-11-29 10:39:05,949 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 589350 2023-11-29 10:39:12,700 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.76 vs. limit=10.0 2023-11-29 10:39:30,039 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3929073.3333333335, ans=0.125 2023-11-29 10:39:34,111 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 200, loss[loss=0.05721, simple_loss=0.07752, pruned_loss=0.009581, audio_tagging_loss=0.008873, over 16357.00 frames. ], tot_loss[loss=0.06962, simple_loss=0.09034, pruned_loss=0.01211, audio_tagging_loss=0.01235, over 1935898.24 frames. ], batch size: 61, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 10:39:34,343 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3929140.0, ans=0.125 2023-11-29 10:39:48,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3929206.6666666665, ans=0.0 2023-11-29 10:39:53,912 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.34 vs. limit=15.0 2023-11-29 10:39:59,861 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.79 vs. limit=15.0 2023-11-29 10:40:01,075 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.14 vs. limit=22.5 2023-11-29 10:40:05,145 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.567e+01 9.198e+01 9.931e+01 1.061e+02 1.225e+02, threshold=1.986e+02, percent-clipped=0.0 2023-11-29 10:40:07,803 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 589400 2023-11-29 10:40:09,297 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3929273.3333333335, ans=0.125 2023-11-29 10:40:14,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3929340.0, ans=0.1 2023-11-29 10:40:31,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3929406.6666666665, ans=0.1 2023-11-29 10:40:36,892 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 250, loss[loss=0.06786, simple_loss=0.09633, pruned_loss=0.01068, audio_tagging_loss=0.009011, over 15017.00 frames. ], tot_loss[loss=0.06901, simple_loss=0.09133, pruned_loss=0.01239, audio_tagging_loss=0.01096, over 2180802.91 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 10:40:38,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3929473.3333333335, ans=0.125 2023-11-29 10:40:44,302 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 10:40:54,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3929540.0, ans=0.0 2023-11-29 10:40:56,436 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 10:40:57,738 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3929540.0, ans=0.125 2023-11-29 10:41:03,935 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.02 vs. limit=6.0 2023-11-29 10:41:11,080 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 589450 2023-11-29 10:41:11,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3929606.6666666665, ans=0.1 2023-11-29 10:41:13,979 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.24 vs. limit=10.0 2023-11-29 10:41:16,204 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3929673.3333333335, ans=0.125 2023-11-29 10:41:20,919 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3929673.3333333335, ans=0.125 2023-11-29 10:41:28,126 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-29 10:41:30,148 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.47 vs. limit=15.0 2023-11-29 10:41:35,930 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=3929740.0, ans=0.95 2023-11-29 10:41:40,662 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 300, loss[loss=0.06207, simple_loss=0.08643, pruned_loss=0.0114, audio_tagging_loss=0.007457, over 14253.00 frames. ], tot_loss[loss=0.06827, simple_loss=0.09125, pruned_loss=0.01229, audio_tagging_loss=0.01035, over 2371512.52 frames. ], batch size: 53, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 10:41:42,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3929806.6666666665, ans=0.125 2023-11-29 10:41:43,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3929806.6666666665, ans=0.125 2023-11-29 10:41:49,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3929806.6666666665, ans=0.0 2023-11-29 10:41:50,666 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3929806.6666666665, ans=0.07 2023-11-29 10:42:05,361 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3929940.0, ans=0.125 2023-11-29 10:42:07,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3929940.0, ans=0.125 2023-11-29 10:42:11,433 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.312e+01 9.170e+01 9.850e+01 1.054e+02 1.427e+02, threshold=1.970e+02, percent-clipped=0.0 2023-11-29 10:42:14,002 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 589500 2023-11-29 10:42:20,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3930006.6666666665, ans=0.2 2023-11-29 10:42:27,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3930006.6666666665, ans=0.125 2023-11-29 10:42:42,536 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 350, loss[loss=0.05255, simple_loss=0.07568, pruned_loss=0.007163, audio_tagging_loss=0.007551, over 15452.00 frames. ], tot_loss[loss=0.06674, simple_loss=0.08988, pruned_loss=0.01198, audio_tagging_loss=0.009823, over 2528168.74 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 10:43:07,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3930273.3333333335, ans=0.0 2023-11-29 10:43:16,891 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 589550 2023-11-29 10:43:21,915 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3930340.0, ans=0.125 2023-11-29 10:43:30,344 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.99 vs. limit=15.0 2023-11-29 10:43:44,395 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 400, loss[loss=0.04653, simple_loss=0.06521, pruned_loss=0.00527, audio_tagging_loss=0.008656, over 16178.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.08941, pruned_loss=0.01205, audio_tagging_loss=0.009414, over 2638587.51 frames. ], batch size: 60, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 10:44:07,800 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.45 vs. limit=15.0 2023-11-29 10:44:08,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=3930540.0, ans=15.0 2023-11-29 10:44:13,739 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.42 vs. limit=15.0 2023-11-29 10:44:16,510 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.818e+01 9.147e+01 9.646e+01 1.038e+02 1.524e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-29 10:44:18,484 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 589600 2023-11-29 10:44:26,091 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 10:44:46,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3930806.6666666665, ans=0.125 2023-11-29 10:44:47,877 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 450, loss[loss=0.05061, simple_loss=0.06203, pruned_loss=0.007486, audio_tagging_loss=0.01211, over 15875.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.08916, pruned_loss=0.01196, audio_tagging_loss=0.009155, over 2728085.74 frames. ], batch size: 62, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 10:45:20,620 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 589650 2023-11-29 10:45:48,717 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 500, loss[loss=0.07333, simple_loss=0.1079, pruned_loss=0.01142, audio_tagging_loss=0.007971, over 15483.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08862, pruned_loss=0.01178, audio_tagging_loss=0.009143, over 2798095.87 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 10:45:57,709 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 10:45:59,407 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.17 vs. limit=15.0 2023-11-29 10:46:12,917 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3931273.3333333335, ans=0.125 2023-11-29 10:46:21,775 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.570e+01 9.167e+01 9.711e+01 1.057e+02 1.221e+02, threshold=1.942e+02, percent-clipped=0.0 2023-11-29 10:46:23,074 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 589700 2023-11-29 10:46:41,178 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3931406.6666666665, ans=0.125 2023-11-29 10:46:48,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3931406.6666666665, ans=0.0 2023-11-29 10:46:50,334 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 550, loss[loss=0.05722, simple_loss=0.07236, pruned_loss=0.01123, audio_tagging_loss=0.009817, over 14337.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.08857, pruned_loss=0.01181, audio_tagging_loss=0.009041, over 2853892.65 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 10:46:56,498 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.64 vs. limit=12.0 2023-11-29 10:47:21,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3931606.6666666665, ans=0.1 2023-11-29 10:47:23,896 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 589750 2023-11-29 10:47:50,326 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3931740.0, ans=0.1 2023-11-29 10:47:52,409 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 600, loss[loss=0.04817, simple_loss=0.06275, pruned_loss=0.008298, audio_tagging_loss=0.008493, over 15134.00 frames. ], tot_loss[loss=0.06487, simple_loss=0.08861, pruned_loss=0.01167, audio_tagging_loss=0.008902, over 2895174.98 frames. ], batch size: 61, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 10:47:53,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3931806.6666666665, ans=0.125 2023-11-29 10:48:07,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3931873.3333333335, ans=0.2 2023-11-29 10:48:10,300 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.13 vs. limit=15.0 2023-11-29 10:48:20,895 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3931940.0, ans=0.125 2023-11-29 10:48:24,204 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.204e+01 8.984e+01 9.771e+01 1.059e+02 2.081e+02, threshold=1.954e+02, percent-clipped=1.0 2023-11-29 10:48:25,515 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 589800 2023-11-29 10:48:45,921 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.10 vs. limit=15.0 2023-11-29 10:48:49,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3932073.3333333335, ans=0.125 2023-11-29 10:48:54,152 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 650, loss[loss=0.06894, simple_loss=0.09229, pruned_loss=0.01382, audio_tagging_loss=0.008984, over 16044.00 frames. ], tot_loss[loss=0.0648, simple_loss=0.08844, pruned_loss=0.0117, audio_tagging_loss=0.008873, over 2928817.05 frames. ], batch size: 59, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 10:48:55,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3932140.0, ans=0.125 2023-11-29 10:49:21,529 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3932273.3333333335, ans=0.1 2023-11-29 10:49:27,338 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 589850 2023-11-29 10:49:27,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3932273.3333333335, ans=0.0 2023-11-29 10:49:29,218 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.69 vs. limit=10.0 2023-11-29 10:49:55,258 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 700, loss[loss=0.05811, simple_loss=0.0756, pruned_loss=0.01084, audio_tagging_loss=0.00947, over 15635.00 frames. ], tot_loss[loss=0.06421, simple_loss=0.08776, pruned_loss=0.01151, audio_tagging_loss=0.008812, over 2954984.44 frames. ], batch size: 61, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 10:50:06,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3932473.3333333335, ans=0.0 2023-11-29 10:50:21,552 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.66 vs. limit=15.0 2023-11-29 10:50:25,365 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3932606.6666666665, ans=0.0 2023-11-29 10:50:27,464 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.376e+01 9.061e+01 9.779e+01 1.049e+02 1.414e+02, threshold=1.956e+02, percent-clipped=0.0 2023-11-29 10:50:28,956 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 589900 2023-11-29 10:50:31,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3932673.3333333335, ans=0.125 2023-11-29 10:50:39,467 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.69 vs. limit=6.0 2023-11-29 10:50:43,149 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.42 vs. limit=15.0 2023-11-29 10:50:50,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3932740.0, ans=0.0 2023-11-29 10:50:55,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3932740.0, ans=0.1 2023-11-29 10:50:57,726 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 750, loss[loss=0.07337, simple_loss=0.09832, pruned_loss=0.01373, audio_tagging_loss=0.01048, over 14716.00 frames. ], tot_loss[loss=0.06459, simple_loss=0.08849, pruned_loss=0.01156, audio_tagging_loss=0.008785, over 2982256.10 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 10:51:05,462 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.81 vs. limit=15.0 2023-11-29 10:51:31,153 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 589950 2023-11-29 10:51:41,958 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.04 vs. limit=12.0 2023-11-29 10:51:47,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3933073.3333333335, ans=0.1 2023-11-29 10:51:59,226 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 800, loss[loss=0.06241, simple_loss=0.09085, pruned_loss=0.01088, audio_tagging_loss=0.006107, over 15484.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08985, pruned_loss=0.01182, audio_tagging_loss=0.008678, over 3001092.58 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 10:52:08,073 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3933140.0, ans=0.125 2023-11-29 10:52:32,228 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.757e+01 9.309e+01 9.917e+01 1.087e+02 1.372e+02, threshold=1.983e+02, percent-clipped=0.0 2023-11-29 10:52:33,578 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 590000 2023-11-29 10:52:38,827 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3933340.0, ans=0.125 2023-11-29 10:52:41,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3933340.0, ans=0.125 2023-11-29 10:53:01,567 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 850, loss[loss=0.05647, simple_loss=0.07491, pruned_loss=0.01015, audio_tagging_loss=0.008865, over 14628.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.08943, pruned_loss=0.01177, audio_tagging_loss=0.008765, over 3013947.69 frames. ], batch size: 55, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 10:53:06,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3933473.3333333335, ans=0.125 2023-11-29 10:53:07,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3933473.3333333335, ans=0.125 2023-11-29 10:53:12,589 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.93 vs. limit=15.0 2023-11-29 10:53:26,721 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.38 vs. limit=12.0 2023-11-29 10:53:28,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3933606.6666666665, ans=0.125 2023-11-29 10:53:35,636 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 590050 2023-11-29 10:53:42,944 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=14.65 vs. limit=15.0 2023-11-29 10:53:55,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3933740.0, ans=0.125 2023-11-29 10:54:05,638 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 900, loss[loss=0.06106, simple_loss=0.08307, pruned_loss=0.01127, audio_tagging_loss=0.008265, over 15219.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.08917, pruned_loss=0.01174, audio_tagging_loss=0.008831, over 3024937.19 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 10:54:26,429 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.09 vs. limit=15.0 2023-11-29 10:54:28,801 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.20 vs. limit=22.5 2023-11-29 10:54:30,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3933940.0, ans=0.125 2023-11-29 10:54:37,705 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.767e+01 9.157e+01 9.744e+01 1.021e+02 1.316e+02, threshold=1.949e+02, percent-clipped=0.0 2023-11-29 10:54:39,068 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 590100 2023-11-29 10:55:07,115 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 950, loss[loss=0.08671, simple_loss=0.1204, pruned_loss=0.01803, audio_tagging_loss=0.008471, over 14504.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.08962, pruned_loss=0.01191, audio_tagging_loss=0.008745, over 3027804.54 frames. ], batch size: 54, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 10:55:11,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3934140.0, ans=0.125 2023-11-29 10:55:33,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3934273.3333333335, ans=0.0 2023-11-29 10:55:39,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3934273.3333333335, ans=0.1 2023-11-29 10:55:41,654 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 590150 2023-11-29 10:55:56,956 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 10:56:09,340 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 1000, loss[loss=0.0875, simple_loss=0.1212, pruned_loss=0.01846, audio_tagging_loss=0.008458, over 16442.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.09006, pruned_loss=0.0121, audio_tagging_loss=0.008625, over 3031550.04 frames. ], batch size: 59, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 10:56:37,482 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 10:56:40,929 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.541e+01 9.201e+01 9.754e+01 1.071e+02 1.435e+02, threshold=1.951e+02, percent-clipped=0.0 2023-11-29 10:56:42,233 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 590200 2023-11-29 10:57:05,959 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.78 vs. limit=6.0 2023-11-29 10:57:08,379 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=3934740.0, ans=0.025 2023-11-29 10:57:08,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3934740.0, ans=0.125 2023-11-29 10:57:12,264 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 1050, loss[loss=0.06029, simple_loss=0.08905, pruned_loss=0.007602, audio_tagging_loss=0.008163, over 14319.00 frames. ], tot_loss[loss=0.06531, simple_loss=0.08958, pruned_loss=0.01195, audio_tagging_loss=0.008569, over 3031450.10 frames. ], batch size: 55, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 10:57:13,752 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 10:57:17,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3934806.6666666665, ans=0.125 2023-11-29 10:57:26,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3934873.3333333335, ans=0.125 2023-11-29 10:57:33,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=3934873.3333333335, ans=15.0 2023-11-29 10:57:34,257 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.69 vs. limit=15.0 2023-11-29 10:57:37,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3934940.0, ans=0.0 2023-11-29 10:57:45,660 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 590250 2023-11-29 10:58:13,859 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 1100, loss[loss=0.07679, simple_loss=0.1089, pruned_loss=0.01542, audio_tagging_loss=0.006935, over 14226.00 frames. ], tot_loss[loss=0.06478, simple_loss=0.0888, pruned_loss=0.01188, audio_tagging_loss=0.008502, over 3028858.28 frames. ], batch size: 52, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 10:58:16,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3935140.0, ans=0.125 2023-11-29 10:58:19,996 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 10:58:40,476 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.85 vs. limit=15.0 2023-11-29 10:58:44,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3935273.3333333335, ans=0.2 2023-11-29 10:58:48,070 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.914e+01 9.269e+01 9.621e+01 1.031e+02 1.312e+02, threshold=1.924e+02, percent-clipped=0.0 2023-11-29 10:58:48,207 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 590300 2023-11-29 10:59:00,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3935340.0, ans=0.0 2023-11-29 10:59:06,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3935406.6666666665, ans=0.125 2023-11-29 10:59:16,236 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 1150, loss[loss=0.0517, simple_loss=0.06907, pruned_loss=0.009363, audio_tagging_loss=0.007807, over 14623.00 frames. ], tot_loss[loss=0.06461, simple_loss=0.08839, pruned_loss=0.0119, audio_tagging_loss=0.008512, over 3035234.88 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 10:59:23,193 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3935473.3333333335, ans=0.125 2023-11-29 10:59:26,947 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.36 vs. limit=12.0 2023-11-29 10:59:31,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3935540.0, ans=0.125 2023-11-29 10:59:42,238 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.28 vs. limit=15.0 2023-11-29 10:59:50,119 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 590350 2023-11-29 10:59:51,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3935606.6666666665, ans=0.125 2023-11-29 10:59:59,803 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 10:59:59,900 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.85 vs. limit=15.0 2023-11-29 10:59:59,937 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.42 vs. limit=15.0 2023-11-29 11:00:08,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3935740.0, ans=0.0 2023-11-29 11:00:15,039 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3935740.0, ans=0.125 2023-11-29 11:00:18,784 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 1200, loss[loss=0.07588, simple_loss=0.1099, pruned_loss=0.01393, audio_tagging_loss=0.007009, over 14081.00 frames. ], tot_loss[loss=0.06487, simple_loss=0.089, pruned_loss=0.01197, audio_tagging_loss=0.008398, over 3034424.64 frames. ], batch size: 53, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:00:19,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3935806.6666666665, ans=0.125 2023-11-29 11:00:38,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3935873.3333333335, ans=0.125 2023-11-29 11:00:47,904 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 11:00:51,912 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 590400 2023-11-29 11:00:52,915 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.975e+01 9.086e+01 9.906e+01 1.090e+02 1.794e+02, threshold=1.981e+02, percent-clipped=0.0 2023-11-29 11:00:58,353 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3936006.6666666665, ans=0.0 2023-11-29 11:01:01,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3936006.6666666665, ans=0.5 2023-11-29 11:01:12,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3936073.3333333335, ans=0.125 2023-11-29 11:01:21,024 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 1250, loss[loss=0.05758, simple_loss=0.07672, pruned_loss=0.01116, audio_tagging_loss=0.008062, over 15262.00 frames. ], tot_loss[loss=0.0646, simple_loss=0.08869, pruned_loss=0.0118, audio_tagging_loss=0.008458, over 3042535.39 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:01:40,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3936206.6666666665, ans=0.05 2023-11-29 11:01:55,103 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 590450 2023-11-29 11:02:22,042 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 1300, loss[loss=0.06928, simple_loss=0.1003, pruned_loss=0.01023, audio_tagging_loss=0.008884, over 14849.00 frames. ], tot_loss[loss=0.06429, simple_loss=0.08836, pruned_loss=0.0117, audio_tagging_loss=0.008409, over 3038246.46 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:02:28,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3936473.3333333335, ans=0.0 2023-11-29 11:02:32,434 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3936473.3333333335, ans=0.125 2023-11-29 11:02:39,192 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.70 vs. limit=15.0 2023-11-29 11:02:42,789 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3936540.0, ans=0.025 2023-11-29 11:02:53,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3936606.6666666665, ans=0.1 2023-11-29 11:02:54,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3936606.6666666665, ans=0.1 2023-11-29 11:02:55,322 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 590500 2023-11-29 11:02:56,414 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.747e+01 8.938e+01 9.408e+01 1.020e+02 1.519e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-29 11:03:11,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3936740.0, ans=0.125 2023-11-29 11:03:11,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3936740.0, ans=0.125 2023-11-29 11:03:13,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3936740.0, ans=0.2 2023-11-29 11:03:17,200 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3936740.0, ans=0.1 2023-11-29 11:03:23,221 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 1350, loss[loss=0.05625, simple_loss=0.07447, pruned_loss=0.009461, audio_tagging_loss=0.009558, over 14976.00 frames. ], tot_loss[loss=0.06448, simple_loss=0.0887, pruned_loss=0.01171, audio_tagging_loss=0.008414, over 3041715.38 frames. ], batch size: 59, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:03:29,938 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.87 vs. limit=15.0 2023-11-29 11:03:43,708 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3936873.3333333335, ans=0.1 2023-11-29 11:03:47,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3936940.0, ans=0.2 2023-11-29 11:03:56,252 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 590550 2023-11-29 11:04:06,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=3937006.6666666665, ans=0.025 2023-11-29 11:04:11,559 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 11:04:25,691 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 1400, loss[loss=0.0577, simple_loss=0.07998, pruned_loss=0.006865, audio_tagging_loss=0.01084, over 15338.00 frames. ], tot_loss[loss=0.064, simple_loss=0.08765, pruned_loss=0.01154, audio_tagging_loss=0.008629, over 3040319.38 frames. ], batch size: 59, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:04:35,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3937140.0, ans=0.125 2023-11-29 11:04:38,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3937206.6666666665, ans=0.0 2023-11-29 11:04:44,513 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3937206.6666666665, ans=0.125 2023-11-29 11:04:44,779 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.77 vs. limit=15.0 2023-11-29 11:04:56,052 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3937273.3333333335, ans=0.0 2023-11-29 11:04:57,913 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3937273.3333333335, ans=0.1 2023-11-29 11:04:58,843 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 590600 2023-11-29 11:04:58,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3937273.3333333335, ans=0.125 2023-11-29 11:04:59,857 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.232e+01 8.992e+01 9.669e+01 1.038e+02 1.341e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-29 11:05:02,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3937340.0, ans=0.125 2023-11-29 11:05:16,026 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.21 vs. limit=15.0 2023-11-29 11:05:17,980 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 11:05:26,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3937473.3333333335, ans=0.125 2023-11-29 11:05:26,989 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 1450, loss[loss=0.04795, simple_loss=0.06429, pruned_loss=0.007145, audio_tagging_loss=0.008666, over 16182.00 frames. ], tot_loss[loss=0.06487, simple_loss=0.08907, pruned_loss=0.01171, audio_tagging_loss=0.008623, over 3049682.16 frames. ], batch size: 61, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:05:50,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3937540.0, ans=0.2 2023-11-29 11:05:56,555 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=3937606.6666666665, ans=0.95 2023-11-29 11:06:01,118 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 590650 2023-11-29 11:06:27,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3937806.6666666665, ans=0.0 2023-11-29 11:06:28,786 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 1500, loss[loss=0.05535, simple_loss=0.06743, pruned_loss=0.01096, audio_tagging_loss=0.01067, over 16681.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.0893, pruned_loss=0.0118, audio_tagging_loss=0.00868, over 3051393.71 frames. ], batch size: 68, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:06:34,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3937806.6666666665, ans=0.0 2023-11-29 11:06:34,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3937806.6666666665, ans=0.0 2023-11-29 11:07:02,085 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 590700 2023-11-29 11:07:03,073 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.205e+01 9.150e+01 9.903e+01 1.059e+02 1.485e+02, threshold=1.981e+02, percent-clipped=0.0 2023-11-29 11:07:03,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=3937940.0, ans=10.0 2023-11-29 11:07:07,672 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3938006.6666666665, ans=0.1 2023-11-29 11:07:13,393 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3938006.6666666665, ans=0.125 2023-11-29 11:07:20,914 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3938073.3333333335, ans=0.2 2023-11-29 11:07:26,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3938073.3333333335, ans=0.05 2023-11-29 11:07:31,375 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 1550, loss[loss=0.0818, simple_loss=0.1168, pruned_loss=0.01654, audio_tagging_loss=0.006868, over 14976.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08942, pruned_loss=0.01181, audio_tagging_loss=0.008622, over 3055565.78 frames. ], batch size: 54, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:07:39,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3938140.0, ans=0.0 2023-11-29 11:07:49,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3938206.6666666665, ans=0.1 2023-11-29 11:08:02,903 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.71 vs. limit=15.0 2023-11-29 11:08:03,661 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 590750 2023-11-29 11:08:12,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3938340.0, ans=0.2 2023-11-29 11:08:13,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3938340.0, ans=0.1 2023-11-29 11:08:32,432 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 1600, loss[loss=0.05432, simple_loss=0.071, pruned_loss=0.007418, audio_tagging_loss=0.01141, over 15090.00 frames. ], tot_loss[loss=0.06477, simple_loss=0.0888, pruned_loss=0.01166, audio_tagging_loss=0.008716, over 3051341.85 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:08:37,494 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3938473.3333333335, ans=0.125 2023-11-29 11:08:45,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3938540.0, ans=0.125 2023-11-29 11:08:56,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3938606.6666666665, ans=0.125 2023-11-29 11:09:06,656 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 590800 2023-11-29 11:09:07,668 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.890e+01 8.907e+01 9.577e+01 1.022e+02 1.784e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-29 11:09:15,549 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.49 vs. limit=15.0 2023-11-29 11:09:23,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3938740.0, ans=0.125 2023-11-29 11:09:27,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3938740.0, ans=0.125 2023-11-29 11:09:34,417 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 1650, loss[loss=0.05173, simple_loss=0.07115, pruned_loss=0.006765, audio_tagging_loss=0.009387, over 16379.00 frames. ], tot_loss[loss=0.06523, simple_loss=0.08913, pruned_loss=0.01179, audio_tagging_loss=0.008882, over 3057733.73 frames. ], batch size: 64, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:09:38,640 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.82 vs. limit=15.0 2023-11-29 11:09:57,832 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.01 vs. limit=15.0 2023-11-29 11:09:58,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3938940.0, ans=0.125 2023-11-29 11:10:05,339 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.75 vs. limit=15.0 2023-11-29 11:10:08,156 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 590850 2023-11-29 11:10:19,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3939006.6666666665, ans=0.125 2023-11-29 11:10:21,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3939006.6666666665, ans=0.0 2023-11-29 11:10:27,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3939073.3333333335, ans=0.0 2023-11-29 11:10:36,657 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 1700, loss[loss=0.08674, simple_loss=0.1191, pruned_loss=0.01634, audio_tagging_loss=0.01084, over 15983.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08978, pruned_loss=0.01187, audio_tagging_loss=0.008876, over 3057239.57 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:10:39,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3939140.0, ans=0.2 2023-11-29 11:10:39,612 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.10 vs. limit=22.5 2023-11-29 11:11:04,540 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.94 vs. limit=15.0 2023-11-29 11:11:09,714 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 590900 2023-11-29 11:11:10,737 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.872e+01 9.146e+01 9.599e+01 1.028e+02 1.355e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-29 11:11:35,176 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 11:11:38,459 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 1750, loss[loss=0.07928, simple_loss=0.1123, pruned_loss=0.01495, audio_tagging_loss=0.008163, over 16142.00 frames. ], tot_loss[loss=0.06515, simple_loss=0.089, pruned_loss=0.01173, audio_tagging_loss=0.008919, over 3056618.99 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:11:49,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3939540.0, ans=0.0 2023-11-29 11:11:52,615 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.31 vs. limit=15.0 2023-11-29 11:12:12,029 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 590950 2023-11-29 11:12:12,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3939606.6666666665, ans=0.0 2023-11-29 11:12:40,141 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 1800, loss[loss=0.05438, simple_loss=0.07553, pruned_loss=0.008232, audio_tagging_loss=0.008382, over 13782.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.08968, pruned_loss=0.01177, audio_tagging_loss=0.008808, over 3053396.30 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:13:10,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3939940.0, ans=0.2 2023-11-29 11:13:11,782 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.16 vs. limit=15.0 2023-11-29 11:13:13,661 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 591000 2023-11-29 11:13:14,668 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.761e+01 9.256e+01 9.797e+01 1.069e+02 1.253e+02, threshold=1.959e+02, percent-clipped=0.0 2023-11-29 11:13:25,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3940006.6666666665, ans=0.125 2023-11-29 11:13:42,553 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 1850, loss[loss=0.07292, simple_loss=0.1046, pruned_loss=0.01084, audio_tagging_loss=0.009766, over 15839.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.09005, pruned_loss=0.01182, audio_tagging_loss=0.008704, over 3052455.31 frames. ], batch size: 59, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:13:54,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3940206.6666666665, ans=0.1 2023-11-29 11:14:03,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3940206.6666666665, ans=0.0 2023-11-29 11:14:05,595 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3940273.3333333335, ans=0.125 2023-11-29 11:14:15,329 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 591050 2023-11-29 11:14:43,522 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 1900, loss[loss=0.05565, simple_loss=0.07942, pruned_loss=0.009912, audio_tagging_loss=0.006027, over 14479.00 frames. ], tot_loss[loss=0.06494, simple_loss=0.08926, pruned_loss=0.01167, audio_tagging_loss=0.008633, over 3050831.25 frames. ], batch size: 54, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:14:43,888 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3940473.3333333335, ans=0.0 2023-11-29 11:15:18,040 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 591100 2023-11-29 11:15:19,023 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.516e+01 8.739e+01 9.784e+01 1.081e+02 1.359e+02, threshold=1.957e+02, percent-clipped=0.0 2023-11-29 11:15:30,247 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.16 vs. limit=12.0 2023-11-29 11:15:35,047 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3940740.0, ans=0.125 2023-11-29 11:15:46,052 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 1950, loss[loss=0.05584, simple_loss=0.07432, pruned_loss=0.009419, audio_tagging_loss=0.009258, over 14459.00 frames. ], tot_loss[loss=0.06511, simple_loss=0.08954, pruned_loss=0.01176, audio_tagging_loss=0.008579, over 3051782.49 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:15:48,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3940806.6666666665, ans=0.0 2023-11-29 11:16:00,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3940873.3333333335, ans=0.125 2023-11-29 11:16:12,993 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3940940.0, ans=0.0 2023-11-29 11:16:18,612 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 591150 2023-11-29 11:16:31,054 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.85 vs. limit=22.5 2023-11-29 11:16:38,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3941073.3333333335, ans=0.2 2023-11-29 11:16:40,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3941073.3333333335, ans=0.125 2023-11-29 11:16:42,525 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3941073.3333333335, ans=0.125 2023-11-29 11:16:48,078 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 2000, loss[loss=0.05972, simple_loss=0.08182, pruned_loss=0.01014, audio_tagging_loss=0.008675, over 14528.00 frames. ], tot_loss[loss=0.06498, simple_loss=0.08919, pruned_loss=0.01184, audio_tagging_loss=0.008544, over 3051208.28 frames. ], batch size: 54, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:16:57,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3941140.0, ans=10.0 2023-11-29 11:17:08,701 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.14 vs. limit=15.0 2023-11-29 11:17:20,877 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 591200 2023-11-29 11:17:21,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3941273.3333333335, ans=0.1 2023-11-29 11:17:21,863 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.802e+01 9.211e+01 9.826e+01 1.048e+02 3.263e+02, threshold=1.965e+02, percent-clipped=1.0 2023-11-29 11:17:24,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3941340.0, ans=0.125 2023-11-29 11:17:35,057 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.92 vs. limit=15.0 2023-11-29 11:17:37,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=3941406.6666666665, ans=0.1 2023-11-29 11:17:42,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3941406.6666666665, ans=0.125 2023-11-29 11:17:49,474 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 2050, loss[loss=0.06607, simple_loss=0.09272, pruned_loss=0.01258, audio_tagging_loss=0.007132, over 15981.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.08885, pruned_loss=0.01187, audio_tagging_loss=0.008616, over 3048733.03 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:18:02,788 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3941540.0, ans=0.0 2023-11-29 11:18:03,875 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 11:18:24,936 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 591250 2023-11-29 11:18:31,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3941673.3333333335, ans=0.125 2023-11-29 11:18:45,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3941740.0, ans=0.125 2023-11-29 11:18:46,594 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3941740.0, ans=0.125 2023-11-29 11:18:53,370 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 2100, loss[loss=0.08432, simple_loss=0.1177, pruned_loss=0.01951, audio_tagging_loss=0.005956, over 15752.00 frames. ], tot_loss[loss=0.06507, simple_loss=0.08928, pruned_loss=0.01192, audio_tagging_loss=0.008513, over 3052003.46 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:18:56,041 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3941806.6666666665, ans=0.125 2023-11-29 11:18:57,317 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.97 vs. limit=15.0 2023-11-29 11:19:25,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3941940.0, ans=0.2 2023-11-29 11:19:26,566 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 591300 2023-11-29 11:19:27,585 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.479e+01 9.036e+01 9.652e+01 1.066e+02 1.265e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-29 11:19:35,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3942006.6666666665, ans=0.125 2023-11-29 11:19:40,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3942006.6666666665, ans=0.125 2023-11-29 11:19:42,296 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3942073.3333333335, ans=0.04949747468305833 2023-11-29 11:19:49,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3942073.3333333335, ans=0.07 2023-11-29 11:19:54,548 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3942140.0, ans=0.2 2023-11-29 11:19:55,531 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 2150, loss[loss=0.05412, simple_loss=0.07765, pruned_loss=0.006185, audio_tagging_loss=0.00911, over 15076.00 frames. ], tot_loss[loss=0.06434, simple_loss=0.08832, pruned_loss=0.01165, audio_tagging_loss=0.008527, over 3043010.83 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:20:04,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3942140.0, ans=0.125 2023-11-29 11:20:10,839 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.09 vs. limit=15.0 2023-11-29 11:20:13,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3942206.6666666665, ans=0.0 2023-11-29 11:20:14,029 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 11:20:28,668 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 591350 2023-11-29 11:20:34,434 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 11:20:45,728 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.83 vs. limit=15.0 2023-11-29 11:20:56,521 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 2200, loss[loss=0.07492, simple_loss=0.1045, pruned_loss=0.01334, audio_tagging_loss=0.009327, over 15940.00 frames. ], tot_loss[loss=0.06476, simple_loss=0.08907, pruned_loss=0.01179, audio_tagging_loss=0.008436, over 3039111.14 frames. ], batch size: 59, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:20:58,311 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.16 vs. limit=22.5 2023-11-29 11:20:59,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3942473.3333333335, ans=0.1 2023-11-29 11:21:06,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3942473.3333333335, ans=0.1 2023-11-29 11:21:18,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3942540.0, ans=0.125 2023-11-29 11:21:19,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3942540.0, ans=0.125 2023-11-29 11:21:20,963 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.81 vs. limit=15.0 2023-11-29 11:21:30,774 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 591400 2023-11-29 11:21:33,261 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.050e+01 9.112e+01 9.556e+01 1.057e+02 1.343e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-29 11:21:51,418 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3942740.0, ans=0.0 2023-11-29 11:21:58,268 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 2250, loss[loss=0.05393, simple_loss=0.06444, pruned_loss=0.01144, audio_tagging_loss=0.01027, over 16117.00 frames. ], tot_loss[loss=0.06501, simple_loss=0.08928, pruned_loss=0.01193, audio_tagging_loss=0.008447, over 3040818.21 frames. ], batch size: 65, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:22:10,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3942873.3333333335, ans=0.1 2023-11-29 11:22:19,329 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.21 vs. limit=15.0 2023-11-29 11:22:27,664 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.17 vs. limit=22.5 2023-11-29 11:22:29,843 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.94 vs. limit=15.0 2023-11-29 11:22:32,893 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 591450 2023-11-29 11:22:33,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3942940.0, ans=0.125 2023-11-29 11:22:34,532 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.62 vs. limit=15.0 2023-11-29 11:22:42,596 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3943006.6666666665, ans=0.125 2023-11-29 11:22:44,198 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.63 vs. limit=10.0 2023-11-29 11:23:01,185 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 2300, loss[loss=0.08587, simple_loss=0.1192, pruned_loss=0.02023, audio_tagging_loss=0.006042, over 15505.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.08946, pruned_loss=0.01192, audio_tagging_loss=0.008461, over 3044016.46 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:23:01,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3943140.0, ans=10.0 2023-11-29 11:23:08,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3943140.0, ans=0.125 2023-11-29 11:23:15,866 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.33 vs. limit=22.5 2023-11-29 11:23:22,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3943206.6666666665, ans=0.125 2023-11-29 11:23:22,218 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3943206.6666666665, ans=0.2 2023-11-29 11:23:28,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3943273.3333333335, ans=0.125 2023-11-29 11:23:31,849 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.69 vs. limit=12.0 2023-11-29 11:23:32,554 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3943273.3333333335, ans=0.1 2023-11-29 11:23:33,655 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 591500 2023-11-29 11:23:36,392 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.869e+01 9.045e+01 9.649e+01 1.036e+02 1.193e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-29 11:23:38,312 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.20 vs. limit=22.5 2023-11-29 11:23:38,323 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.31 vs. limit=15.0 2023-11-29 11:23:51,521 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.94 vs. limit=15.0 2023-11-29 11:23:58,536 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 11:24:03,237 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 2350, loss[loss=0.07584, simple_loss=0.1026, pruned_loss=0.01699, audio_tagging_loss=0.007563, over 14731.00 frames. ], tot_loss[loss=0.06496, simple_loss=0.08905, pruned_loss=0.01185, audio_tagging_loss=0.008592, over 3046469.66 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:24:24,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3943540.0, ans=0.125 2023-11-29 11:24:34,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3943606.6666666665, ans=0.0 2023-11-29 11:24:36,529 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 591550 2023-11-29 11:24:48,806 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.78 vs. limit=10.0 2023-11-29 11:24:52,246 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.90 vs. limit=22.5 2023-11-29 11:24:58,900 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3943740.0, ans=0.1 2023-11-29 11:25:04,326 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 2400, loss[loss=0.06476, simple_loss=0.09178, pruned_loss=0.01072, audio_tagging_loss=0.008144, over 15493.00 frames. ], tot_loss[loss=0.0651, simple_loss=0.08918, pruned_loss=0.01184, audio_tagging_loss=0.008671, over 3047801.17 frames. ], batch size: 59, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:25:06,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3943806.6666666665, ans=0.1 2023-11-29 11:25:08,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3943806.6666666665, ans=0.0 2023-11-29 11:25:10,516 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.15 vs. limit=12.0 2023-11-29 11:25:15,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3943873.3333333335, ans=0.0 2023-11-29 11:25:38,250 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 591600 2023-11-29 11:25:40,822 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.183e+01 9.372e+01 9.981e+01 1.068e+02 1.267e+02, threshold=1.996e+02, percent-clipped=0.0 2023-11-29 11:25:43,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3944006.6666666665, ans=0.2 2023-11-29 11:25:44,941 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.67 vs. limit=6.0 2023-11-29 11:25:48,729 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.03 vs. limit=10.0 2023-11-29 11:25:49,650 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.82 vs. limit=12.0 2023-11-29 11:26:02,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3944073.3333333335, ans=0.125 2023-11-29 11:26:03,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3944073.3333333335, ans=0.125 2023-11-29 11:26:06,103 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 2450, loss[loss=0.0554, simple_loss=0.07158, pruned_loss=0.009936, audio_tagging_loss=0.009678, over 16187.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08935, pruned_loss=0.01178, audio_tagging_loss=0.008681, over 3048009.20 frames. ], batch size: 60, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:26:06,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3944140.0, ans=0.125 2023-11-29 11:26:15,235 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3944140.0, ans=0.2 2023-11-29 11:26:18,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3944206.6666666665, ans=0.125 2023-11-29 11:26:25,530 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 11:26:37,529 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.88 vs. limit=12.0 2023-11-29 11:26:39,243 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 591650 2023-11-29 11:26:46,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3944340.0, ans=0.125 2023-11-29 11:27:06,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3944406.6666666665, ans=0.2 2023-11-29 11:27:08,544 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 2500, loss[loss=0.05912, simple_loss=0.08111, pruned_loss=0.0108, audio_tagging_loss=0.007766, over 14794.00 frames. ], tot_loss[loss=0.06466, simple_loss=0.08853, pruned_loss=0.01168, audio_tagging_loss=0.008715, over 3043153.85 frames. ], batch size: 55, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:27:15,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3944473.3333333335, ans=0.125 2023-11-29 11:27:25,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3944540.0, ans=0.0 2023-11-29 11:27:39,731 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3944606.6666666665, ans=0.125 2023-11-29 11:27:40,762 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 591700 2023-11-29 11:27:44,873 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.514e+01 9.155e+01 9.688e+01 1.051e+02 1.449e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-29 11:27:50,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3944673.3333333335, ans=0.125 2023-11-29 11:28:08,713 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 2550, loss[loss=0.05893, simple_loss=0.08451, pruned_loss=0.007763, audio_tagging_loss=0.008909, over 14701.00 frames. ], tot_loss[loss=0.06471, simple_loss=0.08875, pruned_loss=0.01173, audio_tagging_loss=0.008608, over 3048441.82 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:28:10,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3944806.6666666665, ans=0.125 2023-11-29 11:28:22,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3944873.3333333335, ans=0.0 2023-11-29 11:28:27,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3944873.3333333335, ans=0.2 2023-11-29 11:28:30,027 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.98 vs. limit=6.0 2023-11-29 11:28:40,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3944940.0, ans=0.125 2023-11-29 11:28:42,533 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 591750 2023-11-29 11:28:49,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3945006.6666666665, ans=0.125 2023-11-29 11:29:05,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3945073.3333333335, ans=0.125 2023-11-29 11:29:10,119 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 2600, loss[loss=0.07533, simple_loss=0.1009, pruned_loss=0.01549, audio_tagging_loss=0.009405, over 15997.00 frames. ], tot_loss[loss=0.06503, simple_loss=0.08947, pruned_loss=0.01176, audio_tagging_loss=0.008543, over 3052818.72 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:29:13,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3945140.0, ans=0.125 2023-11-29 11:29:24,183 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.46 vs. limit=12.0 2023-11-29 11:29:44,142 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 591800 2023-11-29 11:29:47,798 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.536e+01 8.895e+01 9.478e+01 1.021e+02 1.360e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-29 11:30:08,555 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.09 vs. limit=15.0 2023-11-29 11:30:13,349 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 2650, loss[loss=0.07529, simple_loss=0.1088, pruned_loss=0.01325, audio_tagging_loss=0.007651, over 15826.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.0896, pruned_loss=0.01183, audio_tagging_loss=0.008444, over 3050510.28 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:30:20,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3945473.3333333335, ans=0.125 2023-11-29 11:30:33,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3945540.0, ans=0.125 2023-11-29 11:30:34,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3945540.0, ans=0.0 2023-11-29 11:30:45,942 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 591850 2023-11-29 11:30:47,461 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.59 vs. limit=15.0 2023-11-29 11:31:14,814 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 2700, loss[loss=0.07778, simple_loss=0.1111, pruned_loss=0.01447, audio_tagging_loss=0.007762, over 15595.00 frames. ], tot_loss[loss=0.06489, simple_loss=0.08926, pruned_loss=0.01184, audio_tagging_loss=0.008419, over 3049117.94 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:31:17,466 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 11:31:19,775 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3945806.6666666665, ans=0.0 2023-11-29 11:31:34,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3945873.3333333335, ans=0.125 2023-11-29 11:31:41,640 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3945940.0, ans=0.1 2023-11-29 11:31:49,096 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 591900 2023-11-29 11:31:49,241 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3945940.0, ans=0.0 2023-11-29 11:31:53,623 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.084e+01 9.168e+01 9.953e+01 1.095e+02 1.462e+02, threshold=1.991e+02, percent-clipped=0.0 2023-11-29 11:32:16,464 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 2750, loss[loss=0.06786, simple_loss=0.09193, pruned_loss=0.01218, audio_tagging_loss=0.009716, over 14644.00 frames. ], tot_loss[loss=0.06497, simple_loss=0.08941, pruned_loss=0.01187, audio_tagging_loss=0.008397, over 3044091.95 frames. ], batch size: 55, lr: 1.35e-03, grad_scale: 8.0 2023-11-29 11:32:38,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3946206.6666666665, ans=0.5 2023-11-29 11:32:39,609 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3946206.6666666665, ans=0.2 2023-11-29 11:32:49,796 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 591950 2023-11-29 11:32:55,927 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3946340.0, ans=0.0 2023-11-29 11:33:03,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3946340.0, ans=0.125 2023-11-29 11:33:10,005 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 11:33:18,156 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 2800, loss[loss=0.07353, simple_loss=0.09182, pruned_loss=0.01857, audio_tagging_loss=0.009047, over 13775.00 frames. ], tot_loss[loss=0.06478, simple_loss=0.08914, pruned_loss=0.01181, audio_tagging_loss=0.008395, over 3048063.38 frames. ], batch size: 55, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:33:39,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3946540.0, ans=0.1 2023-11-29 11:33:51,422 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 592000 2023-11-29 11:33:58,930 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.825e+01 9.130e+01 9.870e+01 1.066e+02 1.963e+02, threshold=1.974e+02, percent-clipped=0.0 2023-11-29 11:34:18,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3946740.0, ans=0.0 2023-11-29 11:34:18,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3946740.0, ans=0.125 2023-11-29 11:34:19,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3946740.0, ans=0.1 2023-11-29 11:34:20,151 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.35 vs. limit=15.0 2023-11-29 11:34:22,986 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 2850, loss[loss=0.08859, simple_loss=0.1191, pruned_loss=0.01998, audio_tagging_loss=0.009081, over 15149.00 frames. ], tot_loss[loss=0.0646, simple_loss=0.08863, pruned_loss=0.01183, audio_tagging_loss=0.008453, over 3052179.79 frames. ], batch size: 55, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:34:32,709 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3946806.6666666665, ans=0.1 2023-11-29 11:34:51,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3946940.0, ans=0.0 2023-11-29 11:34:55,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3946940.0, ans=0.125 2023-11-29 11:34:56,227 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 592050 2023-11-29 11:35:24,271 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 2900, loss[loss=0.06201, simple_loss=0.0878, pruned_loss=0.01002, audio_tagging_loss=0.008086, over 15082.00 frames. ], tot_loss[loss=0.06442, simple_loss=0.08817, pruned_loss=0.01184, audio_tagging_loss=0.008503, over 3044879.72 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:35:24,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3947140.0, ans=0.0 2023-11-29 11:35:30,645 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3947140.0, ans=0.125 2023-11-29 11:35:30,656 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3947140.0, ans=0.0 2023-11-29 11:35:31,867 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 11:35:57,689 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.57 vs. limit=6.0 2023-11-29 11:35:58,244 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 592100 2023-11-29 11:36:02,776 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.678e+01 9.122e+01 9.763e+01 1.061e+02 1.440e+02, threshold=1.953e+02, percent-clipped=0.0 2023-11-29 11:36:05,796 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.57 vs. limit=15.0 2023-11-29 11:36:13,072 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3947406.6666666665, ans=0.0 2023-11-29 11:36:26,621 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 2950, loss[loss=0.05638, simple_loss=0.07677, pruned_loss=0.006792, audio_tagging_loss=0.0112, over 14182.00 frames. ], tot_loss[loss=0.0644, simple_loss=0.088, pruned_loss=0.01182, audio_tagging_loss=0.008589, over 3046303.50 frames. ], batch size: 54, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:36:46,518 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.06 vs. limit=15.0 2023-11-29 11:36:53,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3947606.6666666665, ans=0.1 2023-11-29 11:36:59,598 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 592150 2023-11-29 11:37:14,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3947740.0, ans=0.125 2023-11-29 11:37:15,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3947740.0, ans=0.125 2023-11-29 11:37:27,915 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 3000, loss[loss=0.07073, simple_loss=0.1088, pruned_loss=0.01162, audio_tagging_loss=0.004716, over 14870.00 frames. ], tot_loss[loss=0.06501, simple_loss=0.08896, pruned_loss=0.01195, audio_tagging_loss=0.008575, over 3048903.98 frames. ], batch size: 53, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:37:27,916 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-29 11:37:58,459 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.3340, 4.9936, 4.5974, 5.1384], device='cuda:1') 2023-11-29 11:38:07,426 INFO [train_asr.py:1267] (1/4) Epoch 50, validation: loss=0.05782, simple_loss=0.05046, pruned_loss=0.005473, audio_tagging_loss=0.02712, over 4681554.00 frames. 2023-11-29 11:38:07,427 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-29 11:38:13,536 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3947806.6666666665, ans=0.0 2023-11-29 11:38:28,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3947873.3333333335, ans=0.0 2023-11-29 11:38:40,811 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 592200 2023-11-29 11:38:45,630 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.881e+01 9.188e+01 9.766e+01 1.056e+02 1.297e+02, threshold=1.953e+02, percent-clipped=0.0 2023-11-29 11:39:09,607 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 3050, loss[loss=0.08049, simple_loss=0.112, pruned_loss=0.01669, audio_tagging_loss=0.007791, over 15902.00 frames. ], tot_loss[loss=0.06499, simple_loss=0.08894, pruned_loss=0.0119, audio_tagging_loss=0.008617, over 3047329.84 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:39:09,868 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3948140.0, ans=0.0 2023-11-29 11:39:29,211 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.77 vs. limit=15.0 2023-11-29 11:39:42,143 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 592250 2023-11-29 11:39:45,517 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 11:39:45,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3948340.0, ans=0.125 2023-11-29 11:39:53,079 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.49 vs. limit=15.0 2023-11-29 11:39:56,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3948340.0, ans=0.07 2023-11-29 11:40:11,003 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 3100, loss[loss=0.07116, simple_loss=0.1019, pruned_loss=0.01202, audio_tagging_loss=0.008198, over 14793.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08957, pruned_loss=0.01187, audio_tagging_loss=0.008614, over 3051964.60 frames. ], batch size: 55, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:40:13,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff3.min_abs, batch_count=3948473.3333333335, ans=0.2 2023-11-29 11:40:34,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3948606.6666666665, ans=0.0 2023-11-29 11:40:35,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3948606.6666666665, ans=0.125 2023-11-29 11:40:43,941 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 592300 2023-11-29 11:40:45,263 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3948606.6666666665, ans=0.125 2023-11-29 11:40:46,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3948673.3333333335, ans=0.125 2023-11-29 11:40:48,582 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.753e+01 9.181e+01 9.927e+01 1.074e+02 1.864e+02, threshold=1.985e+02, percent-clipped=0.0 2023-11-29 11:40:54,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3948673.3333333335, ans=0.1 2023-11-29 11:41:12,071 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 3150, loss[loss=0.07764, simple_loss=0.1068, pruned_loss=0.016, audio_tagging_loss=0.008263, over 14938.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08974, pruned_loss=0.01197, audio_tagging_loss=0.008741, over 3054061.76 frames. ], batch size: 53, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:41:15,127 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.23 vs. limit=6.0 2023-11-29 11:41:18,049 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3948806.6666666665, ans=0.0 2023-11-29 11:41:39,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3948940.0, ans=0.0 2023-11-29 11:41:40,083 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=3948940.0, ans=15.0 2023-11-29 11:41:45,113 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 592350 2023-11-29 11:41:45,257 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 11:41:58,509 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3949006.6666666665, ans=0.2 2023-11-29 11:42:05,228 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3949073.3333333335, ans=0.125 2023-11-29 11:42:12,832 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 3200, loss[loss=0.09265, simple_loss=0.1299, pruned_loss=0.02028, audio_tagging_loss=0.007433, over 16236.00 frames. ], tot_loss[loss=0.06602, simple_loss=0.09071, pruned_loss=0.012, audio_tagging_loss=0.008665, over 3057185.28 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:42:22,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3949140.0, ans=0.125 2023-11-29 11:42:25,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3949206.6666666665, ans=0.0 2023-11-29 11:42:30,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3949206.6666666665, ans=0.125 2023-11-29 11:42:45,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3949273.3333333335, ans=0.2 2023-11-29 11:42:46,802 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 592400 2023-11-29 11:42:49,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3949340.0, ans=0.0 2023-11-29 11:42:49,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3949340.0, ans=0.125 2023-11-29 11:42:52,063 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.469e+01 8.966e+01 9.651e+01 1.039e+02 1.549e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-29 11:42:53,828 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3949340.0, ans=0.125 2023-11-29 11:42:55,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3949340.0, ans=0.125 2023-11-29 11:42:57,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3949340.0, ans=0.125 2023-11-29 11:43:06,437 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3949406.6666666665, ans=0.1 2023-11-29 11:43:09,327 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.10 vs. limit=10.0 2023-11-29 11:43:13,800 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3949406.6666666665, ans=0.125 2023-11-29 11:43:15,828 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 3250, loss[loss=0.0417, simple_loss=0.05479, pruned_loss=0.006045, audio_tagging_loss=0.008257, over 14350.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.09011, pruned_loss=0.0118, audio_tagging_loss=0.008734, over 3061826.67 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:43:18,884 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.30 vs. limit=15.0 2023-11-29 11:43:22,335 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.24 vs. limit=6.0 2023-11-29 11:43:34,882 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3949540.0, ans=0.05 2023-11-29 11:43:49,419 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 592450 2023-11-29 11:44:17,906 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 3300, loss[loss=0.08135, simple_loss=0.1101, pruned_loss=0.01666, audio_tagging_loss=0.009651, over 15942.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.08996, pruned_loss=0.01177, audio_tagging_loss=0.008821, over 3060088.34 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:44:21,035 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 11:44:21,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3949806.6666666665, ans=0.125 2023-11-29 11:44:29,600 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3949873.3333333335, ans=0.0 2023-11-29 11:44:49,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3949940.0, ans=0.125 2023-11-29 11:44:51,425 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 592500 2023-11-29 11:44:56,141 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.509e+01 9.045e+01 9.733e+01 1.044e+02 1.292e+02, threshold=1.947e+02, percent-clipped=0.0 2023-11-29 11:44:59,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3950006.6666666665, ans=0.0 2023-11-29 11:45:04,550 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.08 vs. limit=22.5 2023-11-29 11:45:16,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3950073.3333333335, ans=0.1 2023-11-29 11:45:17,305 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.98 vs. limit=10.0 2023-11-29 11:45:20,723 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 3350, loss[loss=0.06547, simple_loss=0.08997, pruned_loss=0.01139, audio_tagging_loss=0.009099, over 15371.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.08978, pruned_loss=0.01182, audio_tagging_loss=0.008754, over 3057718.56 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:45:23,783 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.54 vs. limit=15.0 2023-11-29 11:45:53,723 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 592550 2023-11-29 11:46:06,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3950340.0, ans=0.0 2023-11-29 11:46:22,701 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 3400, loss[loss=0.07084, simple_loss=0.08992, pruned_loss=0.01627, audio_tagging_loss=0.009615, over 14807.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.09045, pruned_loss=0.01189, audio_tagging_loss=0.008601, over 3057221.49 frames. ], batch size: 54, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:46:36,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3950540.0, ans=0.125 2023-11-29 11:46:47,097 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=11.06 vs. limit=15.0 2023-11-29 11:46:56,821 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 592600 2023-11-29 11:46:56,887 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3950606.6666666665, ans=0.1 2023-11-29 11:47:01,895 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.883e+01 9.007e+01 9.772e+01 1.033e+02 1.333e+02, threshold=1.954e+02, percent-clipped=0.0 2023-11-29 11:47:02,813 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.54 vs. limit=12.0 2023-11-29 11:47:24,683 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 3450, loss[loss=0.06318, simple_loss=0.09105, pruned_loss=0.01262, audio_tagging_loss=0.005033, over 15552.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.09053, pruned_loss=0.01193, audio_tagging_loss=0.008578, over 3049033.47 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:47:26,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3950806.6666666665, ans=0.125 2023-11-29 11:47:45,993 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3950873.3333333335, ans=0.1 2023-11-29 11:47:58,766 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 592650 2023-11-29 11:48:07,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3951006.6666666665, ans=0.125 2023-11-29 11:48:08,676 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.12 vs. limit=15.0 2023-11-29 11:48:27,055 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 3500, loss[loss=0.07836, simple_loss=0.09728, pruned_loss=0.01986, audio_tagging_loss=0.009864, over 15391.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.09061, pruned_loss=0.01205, audio_tagging_loss=0.008454, over 3050681.39 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:48:31,500 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.56 vs. limit=15.0 2023-11-29 11:48:39,492 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.03 vs. limit=15.0 2023-11-29 11:48:45,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3951206.6666666665, ans=0.125 2023-11-29 11:48:58,864 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 11:49:00,111 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 592700 2023-11-29 11:49:05,891 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.850e+01 9.064e+01 9.893e+01 1.052e+02 1.385e+02, threshold=1.979e+02, percent-clipped=0.0 2023-11-29 11:49:23,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3951406.6666666665, ans=0.2 2023-11-29 11:49:25,949 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 11:49:29,236 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 3550, loss[loss=0.06904, simple_loss=0.09237, pruned_loss=0.0151, audio_tagging_loss=0.007749, over 13861.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.08994, pruned_loss=0.01201, audio_tagging_loss=0.00848, over 3053845.56 frames. ], batch size: 54, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:49:43,600 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3951540.0, ans=0.1 2023-11-29 11:49:55,942 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.05 vs. limit=6.0 2023-11-29 11:50:02,914 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 592750 2023-11-29 11:50:08,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3951673.3333333335, ans=0.125 2023-11-29 11:50:11,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3951673.3333333335, ans=0.0 2023-11-29 11:50:15,521 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3951673.3333333335, ans=0.0 2023-11-29 11:50:17,916 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3951740.0, ans=0.1 2023-11-29 11:50:30,390 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 3600, loss[loss=0.06957, simple_loss=0.09153, pruned_loss=0.01504, audio_tagging_loss=0.008764, over 16081.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08966, pruned_loss=0.01216, audio_tagging_loss=0.008457, over 3053981.05 frames. ], batch size: 60, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:51:03,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3951940.0, ans=0.125 2023-11-29 11:51:04,649 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 592800 2023-11-29 11:51:09,525 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.807e+01 9.101e+01 9.681e+01 1.023e+02 1.277e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-29 11:51:33,150 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 3650, loss[loss=0.06848, simple_loss=0.09565, pruned_loss=0.01155, audio_tagging_loss=0.009102, over 14942.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08966, pruned_loss=0.01206, audio_tagging_loss=0.008413, over 3056636.68 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:51:34,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3952140.0, ans=0.0 2023-11-29 11:51:47,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3952206.6666666665, ans=0.0 2023-11-29 11:51:48,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3952206.6666666665, ans=0.0 2023-11-29 11:52:06,073 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 592850 2023-11-29 11:52:12,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3952340.0, ans=0.125 2023-11-29 11:52:14,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3952340.0, ans=0.0 2023-11-29 11:52:17,151 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.41 vs. limit=15.0 2023-11-29 11:52:35,279 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 3700, loss[loss=0.07531, simple_loss=0.09805, pruned_loss=0.01577, audio_tagging_loss=0.01051, over 15857.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.08934, pruned_loss=0.01204, audio_tagging_loss=0.008447, over 3059929.10 frames. ], batch size: 61, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:52:37,912 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3952473.3333333335, ans=0.125 2023-11-29 11:52:39,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3952473.3333333335, ans=0.0 2023-11-29 11:52:42,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3952473.3333333335, ans=0.125 2023-11-29 11:52:48,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3952540.0, ans=0.125 2023-11-29 11:53:08,684 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 592900 2023-11-29 11:53:14,445 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.712e+01 9.192e+01 9.964e+01 1.058e+02 1.278e+02, threshold=1.993e+02, percent-clipped=0.0 2023-11-29 11:53:17,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3952673.3333333335, ans=0.09899494936611666 2023-11-29 11:53:36,645 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 3750, loss[loss=0.05621, simple_loss=0.07796, pruned_loss=0.008352, audio_tagging_loss=0.008877, over 14792.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08967, pruned_loss=0.01202, audio_tagging_loss=0.008385, over 3059177.30 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:53:43,196 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.58 vs. limit=15.0 2023-11-29 11:53:43,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3952806.6666666665, ans=0.125 2023-11-29 11:54:10,903 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 592950 2023-11-29 11:54:20,525 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 11:54:25,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3953073.3333333335, ans=0.1 2023-11-29 11:54:38,442 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 3800, loss[loss=0.06805, simple_loss=0.0919, pruned_loss=0.0146, audio_tagging_loss=0.007496, over 15153.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08976, pruned_loss=0.01205, audio_tagging_loss=0.008402, over 3059626.18 frames. ], batch size: 59, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:55:10,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3953273.3333333335, ans=0.125 2023-11-29 11:55:12,156 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 593000 2023-11-29 11:55:15,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3953340.0, ans=0.125 2023-11-29 11:55:18,272 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.426e+01 9.089e+01 9.885e+01 1.067e+02 1.488e+02, threshold=1.977e+02, percent-clipped=0.0 2023-11-29 11:55:31,361 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3953406.6666666665, ans=0.125 2023-11-29 11:55:41,784 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 3850, loss[loss=0.06522, simple_loss=0.0916, pruned_loss=0.01251, audio_tagging_loss=0.006915, over 14262.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08959, pruned_loss=0.01196, audio_tagging_loss=0.008539, over 3061112.39 frames. ], batch size: 54, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:55:47,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3953473.3333333335, ans=0.125 2023-11-29 11:56:04,523 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3953606.6666666665, ans=0.0 2023-11-29 11:56:14,587 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 593050 2023-11-29 11:56:32,156 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.54 vs. limit=12.0 2023-11-29 11:56:43,351 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 3900, loss[loss=0.05623, simple_loss=0.06955, pruned_loss=0.01193, audio_tagging_loss=0.009523, over 15604.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.09024, pruned_loss=0.01206, audio_tagging_loss=0.008512, over 3051065.94 frames. ], batch size: 61, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:56:46,438 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.37 vs. limit=6.0 2023-11-29 11:57:17,576 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 593100 2023-11-29 11:57:23,354 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.808e+01 8.927e+01 9.561e+01 1.012e+02 1.625e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-29 11:57:23,932 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.71 vs. limit=10.0 2023-11-29 11:57:45,092 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 3950, loss[loss=0.06568, simple_loss=0.08603, pruned_loss=0.01174, audio_tagging_loss=0.01093, over 14563.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.09063, pruned_loss=0.01215, audio_tagging_loss=0.008649, over 3052165.00 frames. ], batch size: 54, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:57:51,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3954140.0, ans=0.1 2023-11-29 11:58:18,308 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 593150 2023-11-29 11:58:30,012 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 11:58:32,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3954340.0, ans=0.0 2023-11-29 11:58:39,492 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3954406.6666666665, ans=0.2 2023-11-29 11:58:47,973 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 4000, loss[loss=0.06978, simple_loss=0.09772, pruned_loss=0.01251, audio_tagging_loss=0.008411, over 15530.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.09017, pruned_loss=0.01196, audio_tagging_loss=0.00871, over 3046209.38 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:58:56,559 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3954473.3333333335, ans=0.125 2023-11-29 11:59:04,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3954540.0, ans=0.125 2023-11-29 11:59:05,156 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.74 vs. limit=15.0 2023-11-29 11:59:20,456 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 593200 2023-11-29 11:59:20,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3954606.6666666665, ans=0.5 2023-11-29 11:59:22,142 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.55 vs. limit=15.0 2023-11-29 11:59:26,600 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.870e+01 8.877e+01 9.527e+01 1.031e+02 1.352e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-29 11:59:41,241 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3954740.0, ans=0.125 2023-11-29 11:59:49,343 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 4050, loss[loss=0.08085, simple_loss=0.1095, pruned_loss=0.01829, audio_tagging_loss=0.007801, over 14800.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.08994, pruned_loss=0.01192, audio_tagging_loss=0.008822, over 3044627.86 frames. ], batch size: 54, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:59:53,263 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3954806.6666666665, ans=0.125 2023-11-29 11:59:54,016 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 12:00:16,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3954940.0, ans=0.125 2023-11-29 12:00:22,995 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 593250 2023-11-29 12:00:23,296 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3954940.0, ans=0.125 2023-11-29 12:00:32,228 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3955006.6666666665, ans=0.0 2023-11-29 12:00:51,370 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 4100, loss[loss=0.06464, simple_loss=0.09113, pruned_loss=0.01058, audio_tagging_loss=0.008496, over 14911.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.09037, pruned_loss=0.01196, audio_tagging_loss=0.008762, over 3053973.99 frames. ], batch size: 54, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:00:56,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3955140.0, ans=0.125 2023-11-29 12:00:57,598 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3955140.0, ans=0.0 2023-11-29 12:01:06,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3955206.6666666665, ans=0.2 2023-11-29 12:01:13,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3955206.6666666665, ans=0.125 2023-11-29 12:01:17,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3955273.3333333335, ans=0.125 2023-11-29 12:01:24,881 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 593300 2023-11-29 12:01:31,717 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.450e+01 9.221e+01 9.823e+01 1.065e+02 1.481e+02, threshold=1.965e+02, percent-clipped=0.0 2023-11-29 12:01:38,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3955340.0, ans=0.125 2023-11-29 12:01:44,483 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3955406.6666666665, ans=0.125 2023-11-29 12:01:52,936 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 4150, loss[loss=0.06464, simple_loss=0.09087, pruned_loss=0.01166, audio_tagging_loss=0.007545, over 14992.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.09033, pruned_loss=0.01211, audio_tagging_loss=0.008713, over 3058221.09 frames. ], batch size: 54, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:01:54,482 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 12:02:13,237 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3955540.0, ans=0.0 2023-11-29 12:02:14,808 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.87 vs. limit=22.5 2023-11-29 12:02:25,539 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.04 vs. limit=22.5 2023-11-29 12:02:26,185 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 593350 2023-11-29 12:02:38,441 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 12:02:46,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3955740.0, ans=0.125 2023-11-29 12:02:47,686 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.97 vs. limit=15.0 2023-11-29 12:02:54,800 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 4200, loss[loss=0.05228, simple_loss=0.07027, pruned_loss=0.009884, audio_tagging_loss=0.007268, over 14517.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.0899, pruned_loss=0.01191, audio_tagging_loss=0.008657, over 3055697.38 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:03:00,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3955806.6666666665, ans=0.125 2023-11-29 12:03:08,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3955873.3333333335, ans=0.05 2023-11-29 12:03:18,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3955940.0, ans=0.0 2023-11-29 12:03:28,490 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 593400 2023-11-29 12:03:35,591 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.100e+01 9.099e+01 9.882e+01 1.051e+02 1.333e+02, threshold=1.976e+02, percent-clipped=0.0 2023-11-29 12:03:43,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=3956073.3333333335, ans=0.025 2023-11-29 12:03:54,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3956073.3333333335, ans=0.125 2023-11-29 12:03:55,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3956140.0, ans=0.0 2023-11-29 12:03:56,521 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 4250, loss[loss=0.05669, simple_loss=0.07617, pruned_loss=0.009072, audio_tagging_loss=0.009532, over 15597.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.09019, pruned_loss=0.01212, audio_tagging_loss=0.008518, over 3055060.05 frames. ], batch size: 59, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:04:05,554 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3956140.0, ans=0.125 2023-11-29 12:04:30,860 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 593450 2023-11-29 12:04:35,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3956340.0, ans=0.2 2023-11-29 12:04:37,045 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3956340.0, ans=0.125 2023-11-29 12:04:37,049 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3956340.0, ans=0.125 2023-11-29 12:04:44,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3956340.0, ans=0.125 2023-11-29 12:04:55,929 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3956406.6666666665, ans=0.125 2023-11-29 12:04:58,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3956473.3333333335, ans=0.0 2023-11-29 12:04:58,927 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 4300, loss[loss=0.06153, simple_loss=0.08714, pruned_loss=0.01134, audio_tagging_loss=0.006616, over 14670.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.09037, pruned_loss=0.01206, audio_tagging_loss=0.008437, over 3049447.68 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 8.0 2023-11-29 12:05:08,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3956473.3333333335, ans=0.0 2023-11-29 12:05:14,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3956540.0, ans=0.125 2023-11-29 12:05:29,649 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3956606.6666666665, ans=0.125 2023-11-29 12:05:32,499 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 593500 2023-11-29 12:05:38,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3956673.3333333335, ans=0.125 2023-11-29 12:05:40,632 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.001e+01 9.077e+01 9.622e+01 1.047e+02 1.414e+02, threshold=1.924e+02, percent-clipped=0.0 2023-11-29 12:05:51,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3956740.0, ans=0.0 2023-11-29 12:05:58,309 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3956740.0, ans=0.125 2023-11-29 12:06:00,191 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 4350, loss[loss=0.04853, simple_loss=0.06874, pruned_loss=0.006908, audio_tagging_loss=0.007255, over 14798.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.0909, pruned_loss=0.01211, audio_tagging_loss=0.008346, over 3051716.86 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 8.0 2023-11-29 12:06:29,725 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.40 vs. limit=22.5 2023-11-29 12:06:33,059 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 593550 2023-11-29 12:06:48,707 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.24 vs. limit=22.5 2023-11-29 12:07:01,224 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 12:07:02,014 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 4400, loss[loss=0.06703, simple_loss=0.09704, pruned_loss=0.01113, audio_tagging_loss=0.007386, over 15515.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.09148, pruned_loss=0.01222, audio_tagging_loss=0.008217, over 3049955.91 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:07:13,591 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 12:07:26,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3957273.3333333335, ans=0.1 2023-11-29 12:07:36,566 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 593600 2023-11-29 12:07:45,790 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.856e+01 9.188e+01 9.758e+01 1.053e+02 1.476e+02, threshold=1.952e+02, percent-clipped=0.0 2023-11-29 12:07:59,054 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.78 vs. limit=22.5 2023-11-29 12:08:05,321 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 4450, loss[loss=0.05192, simple_loss=0.07191, pruned_loss=0.008343, audio_tagging_loss=0.007622, over 15329.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.09193, pruned_loss=0.01218, audio_tagging_loss=0.008183, over 3054294.92 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:08:18,953 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.22 vs. limit=22.5 2023-11-29 12:08:38,793 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 593650 2023-11-29 12:08:52,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3957673.3333333335, ans=0.125 2023-11-29 12:08:54,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3957740.0, ans=0.125 2023-11-29 12:09:07,771 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 4500, loss[loss=0.07073, simple_loss=0.1025, pruned_loss=0.01282, audio_tagging_loss=0.006688, over 15498.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.09128, pruned_loss=0.01222, audio_tagging_loss=0.008264, over 3054427.43 frames. ], batch size: 54, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:09:12,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3957806.6666666665, ans=0.2 2023-11-29 12:09:12,959 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.15 vs. limit=6.0 2023-11-29 12:09:31,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3957940.0, ans=0.125 2023-11-29 12:09:33,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3957940.0, ans=0.0 2023-11-29 12:09:39,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3957940.0, ans=0.0 2023-11-29 12:09:41,405 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 593700 2023-11-29 12:09:50,098 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.301e+01 9.149e+01 9.833e+01 1.069e+02 1.731e+02, threshold=1.967e+02, percent-clipped=0.0 2023-11-29 12:09:50,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3958006.6666666665, ans=0.125 2023-11-29 12:09:56,146 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 12:10:01,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3958073.3333333335, ans=0.125 2023-11-29 12:10:04,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3958073.3333333335, ans=0.0 2023-11-29 12:10:08,748 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 4550, loss[loss=0.06064, simple_loss=0.08141, pruned_loss=0.01073, audio_tagging_loss=0.009204, over 16269.00 frames. ], tot_loss[loss=0.06595, simple_loss=0.09088, pruned_loss=0.0122, audio_tagging_loss=0.00831, over 3052588.47 frames. ], batch size: 61, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:10:24,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten.whitening_limit, batch_count=3958206.6666666665, ans=15.0 2023-11-29 12:10:31,122 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3958206.6666666665, ans=0.125 2023-11-29 12:10:43,161 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 593750 2023-11-29 12:10:43,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3958273.3333333335, ans=0.125 2023-11-29 12:10:56,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3958340.0, ans=0.2 2023-11-29 12:10:57,178 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 12:11:00,559 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3958406.6666666665, ans=0.04949747468305833 2023-11-29 12:11:11,238 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 4600, loss[loss=0.06587, simple_loss=0.08449, pruned_loss=0.0125, audio_tagging_loss=0.01112, over 14904.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.09007, pruned_loss=0.01199, audio_tagging_loss=0.008393, over 3046541.66 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:11:13,335 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.38 vs. limit=5.0 2023-11-29 12:11:44,163 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 593800 2023-11-29 12:11:44,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3958606.6666666665, ans=0.0 2023-11-29 12:11:50,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3958673.3333333335, ans=0.125 2023-11-29 12:11:53,857 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.850e+01 9.081e+01 9.672e+01 1.036e+02 1.224e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-29 12:12:13,812 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 4650, loss[loss=0.06292, simple_loss=0.08739, pruned_loss=0.01055, audio_tagging_loss=0.008672, over 14515.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.0899, pruned_loss=0.01197, audio_tagging_loss=0.008541, over 3044276.54 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:12:43,066 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3958940.0, ans=0.2 2023-11-29 12:12:46,298 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 593850 2023-11-29 12:13:06,361 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3959073.3333333335, ans=0.0 2023-11-29 12:13:14,371 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 4700, loss[loss=0.06909, simple_loss=0.1002, pruned_loss=0.0104, audio_tagging_loss=0.008591, over 15432.00 frames. ], tot_loss[loss=0.06523, simple_loss=0.08952, pruned_loss=0.01189, audio_tagging_loss=0.008587, over 3046867.63 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:13:16,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3959140.0, ans=0.0 2023-11-29 12:13:42,126 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.39 vs. limit=15.0 2023-11-29 12:13:48,594 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 593900 2023-11-29 12:13:48,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3959273.3333333335, ans=0.125 2023-11-29 12:13:56,717 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.648e+01 9.196e+01 9.820e+01 1.091e+02 1.389e+02, threshold=1.964e+02, percent-clipped=0.0 2023-11-29 12:14:02,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3959406.6666666665, ans=0.0 2023-11-29 12:14:03,186 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.26 vs. limit=15.0 2023-11-29 12:14:16,800 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 4750, loss[loss=0.06684, simple_loss=0.09251, pruned_loss=0.01221, audio_tagging_loss=0.008382, over 14657.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.08877, pruned_loss=0.01183, audio_tagging_loss=0.008705, over 3042130.41 frames. ], batch size: 54, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:14:23,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3959473.3333333335, ans=0.0 2023-11-29 12:14:49,614 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 593950 2023-11-29 12:14:58,860 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3959673.3333333335, ans=0.1 2023-11-29 12:15:06,566 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3959740.0, ans=0.125 2023-11-29 12:15:19,312 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 4800, loss[loss=0.07253, simple_loss=0.0925, pruned_loss=0.01307, audio_tagging_loss=0.01321, over 14767.00 frames. ], tot_loss[loss=0.06489, simple_loss=0.0886, pruned_loss=0.01173, audio_tagging_loss=0.008863, over 3041605.87 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:15:38,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3959873.3333333335, ans=0.125 2023-11-29 12:15:46,178 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3959940.0, ans=0.1 2023-11-29 12:15:52,341 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 594000 2023-11-29 12:15:59,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3960006.6666666665, ans=0.035 2023-11-29 12:16:00,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3960006.6666666665, ans=0.125 2023-11-29 12:16:01,796 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.081e+01 9.011e+01 9.691e+01 1.047e+02 1.422e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-29 12:16:03,344 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3960006.6666666665, ans=0.125 2023-11-29 12:16:10,145 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3960073.3333333335, ans=0.0 2023-11-29 12:16:20,314 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 4850, loss[loss=0.06956, simple_loss=0.09121, pruned_loss=0.01337, audio_tagging_loss=0.01058, over 15442.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08944, pruned_loss=0.01181, audio_tagging_loss=0.008912, over 3044540.77 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:16:37,674 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3960206.6666666665, ans=0.0 2023-11-29 12:16:54,302 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 594050 2023-11-29 12:17:00,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3960340.0, ans=0.125 2023-11-29 12:17:03,752 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 12:17:19,523 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3960406.6666666665, ans=0.125 2023-11-29 12:17:21,452 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 4900, loss[loss=0.07129, simple_loss=0.1005, pruned_loss=0.0136, audio_tagging_loss=0.007452, over 15272.00 frames. ], tot_loss[loss=0.06538, simple_loss=0.08971, pruned_loss=0.01166, audio_tagging_loss=0.008859, over 3038212.20 frames. ], batch size: 59, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:17:43,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3960540.0, ans=0.0 2023-11-29 12:17:55,306 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 594100 2023-11-29 12:18:04,659 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.803e+01 9.116e+01 9.769e+01 1.041e+02 2.380e+02, threshold=1.954e+02, percent-clipped=1.0 2023-11-29 12:18:06,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3960673.3333333335, ans=0.125 2023-11-29 12:18:09,642 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.81 vs. limit=6.0 2023-11-29 12:18:24,997 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 4950, loss[loss=0.102, simple_loss=0.1421, pruned_loss=0.02543, audio_tagging_loss=0.005557, over 14871.00 frames. ], tot_loss[loss=0.06484, simple_loss=0.08882, pruned_loss=0.01168, audio_tagging_loss=0.008751, over 3036236.76 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:18:57,393 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 594150 2023-11-29 12:19:04,057 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3961006.6666666665, ans=0.125 2023-11-29 12:19:08,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3961006.6666666665, ans=0.09899494936611666 2023-11-29 12:19:18,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3961073.3333333335, ans=0.125 2023-11-29 12:19:26,301 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 5000, loss[loss=0.0585, simple_loss=0.08027, pruned_loss=0.01003, audio_tagging_loss=0.008336, over 14813.00 frames. ], tot_loss[loss=0.06439, simple_loss=0.08863, pruned_loss=0.01154, audio_tagging_loss=0.008541, over 3037960.01 frames. ], batch size: 55, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:19:31,617 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.82 vs. limit=15.0 2023-11-29 12:19:33,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3961140.0, ans=0.125 2023-11-29 12:19:59,640 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 594200 2023-11-29 12:20:03,695 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3961340.0, ans=0.125 2023-11-29 12:20:09,315 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.899e+01 8.950e+01 9.411e+01 1.015e+02 1.285e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-29 12:20:09,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3961340.0, ans=0.0 2023-11-29 12:20:25,449 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.71 vs. limit=15.0 2023-11-29 12:20:27,682 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 5050, loss[loss=0.05331, simple_loss=0.07605, pruned_loss=0.006793, audio_tagging_loss=0.008494, over 13856.00 frames. ], tot_loss[loss=0.06454, simple_loss=0.08894, pruned_loss=0.01163, audio_tagging_loss=0.008445, over 3039024.32 frames. ], batch size: 53, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:20:31,555 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3961473.3333333335, ans=0.125 2023-11-29 12:20:34,148 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.76 vs. limit=15.0 2023-11-29 12:21:01,510 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 594250 2023-11-29 12:21:30,080 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 5100, loss[loss=0.05713, simple_loss=0.07592, pruned_loss=0.01007, audio_tagging_loss=0.009104, over 14499.00 frames. ], tot_loss[loss=0.06356, simple_loss=0.08745, pruned_loss=0.01137, audio_tagging_loss=0.008459, over 3036993.83 frames. ], batch size: 54, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:21:34,781 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3961806.6666666665, ans=0.0 2023-11-29 12:21:57,664 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.43 vs. limit=6.0 2023-11-29 12:22:03,827 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 594300 2023-11-29 12:22:06,735 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.92 vs. limit=15.0 2023-11-29 12:22:13,772 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.335e+01 8.985e+01 9.588e+01 1.015e+02 1.337e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-29 12:22:14,351 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.56 vs. limit=15.0 2023-11-29 12:22:32,640 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 5150, loss[loss=0.05388, simple_loss=0.07139, pruned_loss=0.009591, audio_tagging_loss=0.008587, over 14680.00 frames. ], tot_loss[loss=0.06362, simple_loss=0.08742, pruned_loss=0.0115, audio_tagging_loss=0.008406, over 3036224.14 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:22:36,492 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3962140.0, ans=0.0 2023-11-29 12:22:43,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3962206.6666666665, ans=0.0 2023-11-29 12:22:48,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3962206.6666666665, ans=0.1 2023-11-29 12:22:57,523 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3962273.3333333335, ans=0.1 2023-11-29 12:23:06,603 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 594350 2023-11-29 12:23:12,537 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.54 vs. limit=15.0 2023-11-29 12:23:31,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3962406.6666666665, ans=0.0 2023-11-29 12:23:34,683 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 5200, loss[loss=0.07719, simple_loss=0.1049, pruned_loss=0.01715, audio_tagging_loss=0.007574, over 15082.00 frames. ], tot_loss[loss=0.06419, simple_loss=0.08871, pruned_loss=0.01145, audio_tagging_loss=0.00839, over 3044212.10 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:23:49,742 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.48 vs. limit=22.5 2023-11-29 12:24:08,916 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 594400 2023-11-29 12:24:18,559 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.794e+01 9.283e+01 9.729e+01 1.049e+02 1.320e+02, threshold=1.946e+02, percent-clipped=0.0 2023-11-29 12:24:22,271 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3962673.3333333335, ans=0.0 2023-11-29 12:24:37,178 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 5250, loss[loss=0.05792, simple_loss=0.07615, pruned_loss=0.01033, audio_tagging_loss=0.009519, over 14877.00 frames. ], tot_loss[loss=0.06483, simple_loss=0.08938, pruned_loss=0.01175, audio_tagging_loss=0.008388, over 3045007.77 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:24:37,353 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3962806.6666666665, ans=0.125 2023-11-29 12:24:37,855 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.75 vs. limit=15.0 2023-11-29 12:24:42,694 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3962806.6666666665, ans=0.0 2023-11-29 12:24:43,919 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3962806.6666666665, ans=0.1 2023-11-29 12:24:45,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3962806.6666666665, ans=0.125 2023-11-29 12:24:58,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3962873.3333333335, ans=0.2 2023-11-29 12:25:03,614 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3962940.0, ans=0.1 2023-11-29 12:25:10,220 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 594450 2023-11-29 12:25:12,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3962940.0, ans=0.0 2023-11-29 12:25:20,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3963006.6666666665, ans=0.125 2023-11-29 12:25:39,473 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 5300, loss[loss=0.0473, simple_loss=0.0615, pruned_loss=0.006602, audio_tagging_loss=0.009949, over 15546.00 frames. ], tot_loss[loss=0.06444, simple_loss=0.08861, pruned_loss=0.01172, audio_tagging_loss=0.008409, over 3052792.26 frames. ], batch size: 59, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:25:45,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3963140.0, ans=0.0 2023-11-29 12:25:51,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3963206.6666666665, ans=0.125 2023-11-29 12:26:03,213 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.58 vs. limit=22.5 2023-11-29 12:26:13,223 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 594500 2023-11-29 12:26:22,669 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.843e+01 9.148e+01 9.632e+01 1.017e+02 1.264e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-29 12:26:36,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3963406.6666666665, ans=0.0 2023-11-29 12:26:41,286 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 5350, loss[loss=0.07249, simple_loss=0.1083, pruned_loss=0.00947, audio_tagging_loss=0.008882, over 14981.00 frames. ], tot_loss[loss=0.06426, simple_loss=0.08848, pruned_loss=0.01162, audio_tagging_loss=0.008404, over 3051987.32 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:26:42,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3963473.3333333335, ans=0.125 2023-11-29 12:27:07,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3963606.6666666665, ans=0.0 2023-11-29 12:27:15,506 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 594550 2023-11-29 12:27:25,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3963673.3333333335, ans=0.125 2023-11-29 12:27:38,743 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3963740.0, ans=0.2 2023-11-29 12:27:42,705 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.21 vs. limit=6.0 2023-11-29 12:27:43,675 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 5400, loss[loss=0.07162, simple_loss=0.1091, pruned_loss=0.01061, audio_tagging_loss=0.006474, over 15864.00 frames. ], tot_loss[loss=0.06447, simple_loss=0.08883, pruned_loss=0.01164, audio_tagging_loss=0.008408, over 3058437.01 frames. ], batch size: 59, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:27:47,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3963806.6666666665, ans=0.125 2023-11-29 12:27:53,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3963806.6666666665, ans=0.1 2023-11-29 12:27:56,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3963873.3333333335, ans=0.1 2023-11-29 12:28:16,352 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 594600 2023-11-29 12:28:26,685 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.274e+01 9.096e+01 9.650e+01 1.029e+02 1.446e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-29 12:28:29,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3964006.6666666665, ans=0.125 2023-11-29 12:28:30,394 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3964006.6666666665, ans=0.1 2023-11-29 12:28:43,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3964073.3333333335, ans=0.0 2023-11-29 12:28:45,306 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 5450, loss[loss=0.04839, simple_loss=0.06719, pruned_loss=0.006775, audio_tagging_loss=0.008022, over 14585.00 frames. ], tot_loss[loss=0.065, simple_loss=0.08965, pruned_loss=0.01182, audio_tagging_loss=0.008354, over 3056774.44 frames. ], batch size: 55, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:28:51,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3964140.0, ans=0.125 2023-11-29 12:28:58,314 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.44 vs. limit=15.0 2023-11-29 12:29:03,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3964206.6666666665, ans=0.0 2023-11-29 12:29:09,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3964273.3333333335, ans=0.125 2023-11-29 12:29:15,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3964273.3333333335, ans=0.1 2023-11-29 12:29:19,090 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 594650 2023-11-29 12:29:23,500 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3964340.0, ans=0.1 2023-11-29 12:29:37,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3964406.6666666665, ans=0.0 2023-11-29 12:29:47,562 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 5500, loss[loss=0.07256, simple_loss=0.09801, pruned_loss=0.01427, audio_tagging_loss=0.009287, over 15953.00 frames. ], tot_loss[loss=0.06522, simple_loss=0.08977, pruned_loss=0.01187, audio_tagging_loss=0.008466, over 3052590.03 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:29:48,291 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.48 vs. limit=15.0 2023-11-29 12:29:51,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3964473.3333333335, ans=0.0 2023-11-29 12:30:05,986 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.60 vs. limit=22.5 2023-11-29 12:30:15,664 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3964606.6666666665, ans=0.125 2023-11-29 12:30:21,170 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 594700 2023-11-29 12:30:32,243 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.770e+01 9.260e+01 9.828e+01 1.052e+02 2.145e+02, threshold=1.966e+02, percent-clipped=1.0 2023-11-29 12:30:39,731 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3964740.0, ans=0.0 2023-11-29 12:30:49,514 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 5550, loss[loss=0.05726, simple_loss=0.06495, pruned_loss=0.01055, audio_tagging_loss=0.01424, over 15091.00 frames. ], tot_loss[loss=0.06528, simple_loss=0.08946, pruned_loss=0.01195, audio_tagging_loss=0.008597, over 3047673.32 frames. ], batch size: 59, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:31:06,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3964873.3333333335, ans=0.0 2023-11-29 12:31:08,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3964873.3333333335, ans=0.125 2023-11-29 12:31:17,204 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3964940.0, ans=0.125 2023-11-29 12:31:21,987 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.99 vs. limit=6.0 2023-11-29 12:31:22,521 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 594750 2023-11-29 12:31:31,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3965006.6666666665, ans=0.125 2023-11-29 12:31:41,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3965073.3333333335, ans=0.125 2023-11-29 12:31:46,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3965073.3333333335, ans=0.125 2023-11-29 12:31:51,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3965140.0, ans=0.1 2023-11-29 12:31:52,100 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 5600, loss[loss=0.06638, simple_loss=0.09497, pruned_loss=0.008953, audio_tagging_loss=0.009941, over 15500.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.08938, pruned_loss=0.01187, audio_tagging_loss=0.008693, over 3054110.26 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:31:53,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3965140.0, ans=0.125 2023-11-29 12:32:25,884 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 594800 2023-11-29 12:32:31,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3965340.0, ans=0.125 2023-11-29 12:32:37,341 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.718e+01 9.303e+01 9.793e+01 1.041e+02 1.252e+02, threshold=1.959e+02, percent-clipped=0.0 2023-11-29 12:32:38,558 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 12:32:42,989 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.10 vs. limit=15.0 2023-11-29 12:32:51,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3965406.6666666665, ans=0.2 2023-11-29 12:32:53,633 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 5650, loss[loss=0.06506, simple_loss=0.08696, pruned_loss=0.01229, audio_tagging_loss=0.009288, over 14419.00 frames. ], tot_loss[loss=0.06504, simple_loss=0.089, pruned_loss=0.01177, audio_tagging_loss=0.008769, over 3057655.01 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:33:09,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3965540.0, ans=0.2 2023-11-29 12:33:12,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3965540.0, ans=0.125 2023-11-29 12:33:28,140 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 594850 2023-11-29 12:33:56,405 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 5700, loss[loss=0.05424, simple_loss=0.06989, pruned_loss=0.008363, audio_tagging_loss=0.01093, over 15083.00 frames. ], tot_loss[loss=0.06412, simple_loss=0.08754, pruned_loss=0.0115, audio_tagging_loss=0.008849, over 3047401.46 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:33:56,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3965806.6666666665, ans=0.04949747468305833 2023-11-29 12:33:57,777 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3965806.6666666665, ans=0.125 2023-11-29 12:33:59,086 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3965806.6666666665, ans=0.125 2023-11-29 12:34:10,693 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3965873.3333333335, ans=0.125 2023-11-29 12:34:20,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3965940.0, ans=0.0 2023-11-29 12:34:29,241 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 594900 2023-11-29 12:34:40,794 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.829e+01 8.907e+01 9.442e+01 9.916e+01 1.221e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-29 12:34:53,536 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3966073.3333333335, ans=0.125 2023-11-29 12:34:58,508 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 5750, loss[loss=0.07258, simple_loss=0.0979, pruned_loss=0.01412, audio_tagging_loss=0.009511, over 15040.00 frames. ], tot_loss[loss=0.06425, simple_loss=0.08798, pruned_loss=0.01153, audio_tagging_loss=0.008733, over 3044990.44 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:34:58,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3966140.0, ans=0.125 2023-11-29 12:34:59,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3966140.0, ans=0.0 2023-11-29 12:35:01,418 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.60 vs. limit=22.5 2023-11-29 12:35:02,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3966140.0, ans=0.2 2023-11-29 12:35:03,867 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.02 vs. limit=22.5 2023-11-29 12:35:14,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3966206.6666666665, ans=0.0 2023-11-29 12:35:31,946 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 594950 2023-11-29 12:35:37,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3966340.0, ans=0.035 2023-11-29 12:35:42,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3966340.0, ans=0.95 2023-11-29 12:35:45,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3966340.0, ans=0.1 2023-11-29 12:35:59,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3966473.3333333335, ans=0.0 2023-11-29 12:36:00,185 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 5800, loss[loss=0.07507, simple_loss=0.1046, pruned_loss=0.01385, audio_tagging_loss=0.008932, over 15526.00 frames. ], tot_loss[loss=0.06439, simple_loss=0.08843, pruned_loss=0.01159, audio_tagging_loss=0.008578, over 3046543.89 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:36:07,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3966473.3333333335, ans=0.125 2023-11-29 12:36:10,161 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.30 vs. limit=6.0 2023-11-29 12:36:15,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3966540.0, ans=0.0 2023-11-29 12:36:22,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3966540.0, ans=0.125 2023-11-29 12:36:34,110 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 595000 2023-11-29 12:36:34,193 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3966606.6666666665, ans=0.035 2023-11-29 12:36:45,839 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.270e+01 9.166e+01 9.851e+01 1.059e+02 1.504e+02, threshold=1.970e+02, percent-clipped=0.0 2023-11-29 12:36:50,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3966740.0, ans=0.1 2023-11-29 12:37:01,721 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 5850, loss[loss=0.07157, simple_loss=0.1023, pruned_loss=0.0164, audio_tagging_loss=0.004025, over 15605.00 frames. ], tot_loss[loss=0.06428, simple_loss=0.08822, pruned_loss=0.01164, audio_tagging_loss=0.008521, over 3053527.33 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:37:18,260 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3966873.3333333335, ans=0.125 2023-11-29 12:37:34,443 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 595050 2023-11-29 12:37:41,891 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.75 vs. limit=12.0 2023-11-29 12:37:55,009 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.47 vs. limit=15.0 2023-11-29 12:37:57,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3967073.3333333335, ans=0.125 2023-11-29 12:38:03,807 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 5900, loss[loss=0.04945, simple_loss=0.0687, pruned_loss=0.005377, audio_tagging_loss=0.009721, over 14979.00 frames. ], tot_loss[loss=0.06441, simple_loss=0.08823, pruned_loss=0.0118, audio_tagging_loss=0.008493, over 3044291.35 frames. ], batch size: 55, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:38:37,020 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 595100 2023-11-29 12:38:49,679 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.654e+01 9.143e+01 9.896e+01 1.087e+02 1.374e+02, threshold=1.979e+02, percent-clipped=0.0 2023-11-29 12:38:53,441 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3967406.6666666665, ans=0.1 2023-11-29 12:38:53,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3967406.6666666665, ans=0.125 2023-11-29 12:38:56,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3967406.6666666665, ans=0.125 2023-11-29 12:38:59,344 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3967406.6666666665, ans=0.0 2023-11-29 12:39:02,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3967406.6666666665, ans=0.1 2023-11-29 12:39:04,747 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 5950, loss[loss=0.05888, simple_loss=0.07626, pruned_loss=0.009668, audio_tagging_loss=0.01108, over 14867.00 frames. ], tot_loss[loss=0.0646, simple_loss=0.08856, pruned_loss=0.0119, audio_tagging_loss=0.008421, over 3049223.28 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:39:10,189 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.25 vs. limit=12.0 2023-11-29 12:39:11,086 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3967473.3333333335, ans=0.09899494936611666 2023-11-29 12:39:25,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3967540.0, ans=0.0 2023-11-29 12:39:29,070 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.61 vs. limit=22.5 2023-11-29 12:39:30,134 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.57 vs. limit=15.0 2023-11-29 12:39:35,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3967606.6666666665, ans=0.125 2023-11-29 12:39:38,967 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 595150 2023-11-29 12:39:50,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3967673.3333333335, ans=0.125 2023-11-29 12:39:55,513 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3967740.0, ans=0.2 2023-11-29 12:40:04,831 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.63 vs. limit=22.5 2023-11-29 12:40:06,609 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 6000, loss[loss=0.0887, simple_loss=0.1239, pruned_loss=0.01905, audio_tagging_loss=0.007691, over 15378.00 frames. ], tot_loss[loss=0.06485, simple_loss=0.08906, pruned_loss=0.01195, audio_tagging_loss=0.008369, over 3049264.21 frames. ], batch size: 53, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:40:06,610 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-29 12:40:40,250 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.6218, 3.7409, 4.0091, 3.4338], device='cuda:1') 2023-11-29 12:40:46,471 INFO [train_asr.py:1267] (1/4) Epoch 50, validation: loss=0.05775, simple_loss=0.05043, pruned_loss=0.005339, audio_tagging_loss=0.0272, over 4681554.00 frames. 2023-11-29 12:40:46,471 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-29 12:40:47,106 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.38 vs. limit=15.0 2023-11-29 12:40:50,574 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.78 vs. limit=15.0 2023-11-29 12:41:03,196 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3967873.3333333335, ans=0.0 2023-11-29 12:41:14,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3967940.0, ans=0.125 2023-11-29 12:41:17,218 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.39 vs. limit=22.5 2023-11-29 12:41:18,918 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 595200 2023-11-29 12:41:32,838 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.848e+01 9.055e+01 9.788e+01 1.026e+02 1.358e+02, threshold=1.958e+02, percent-clipped=0.0 2023-11-29 12:41:32,919 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 12:41:39,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3968073.3333333335, ans=0.1 2023-11-29 12:41:41,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3968073.3333333335, ans=0.2 2023-11-29 12:41:43,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3968073.3333333335, ans=0.0 2023-11-29 12:41:46,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3968073.3333333335, ans=0.1 2023-11-29 12:41:48,279 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 6050, loss[loss=0.05399, simple_loss=0.06514, pruned_loss=0.008945, audio_tagging_loss=0.01247, over 15750.00 frames. ], tot_loss[loss=0.06461, simple_loss=0.08868, pruned_loss=0.01182, audio_tagging_loss=0.008446, over 3045763.32 frames. ], batch size: 62, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:41:50,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3968140.0, ans=0.125 2023-11-29 12:41:56,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3968140.0, ans=0.0 2023-11-29 12:41:59,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3968206.6666666665, ans=0.1 2023-11-29 12:42:12,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3968273.3333333335, ans=0.05 2023-11-29 12:42:20,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3968273.3333333335, ans=0.0 2023-11-29 12:42:21,769 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 595250 2023-11-29 12:42:31,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3968340.0, ans=0.125 2023-11-29 12:42:40,816 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3968406.6666666665, ans=0.0 2023-11-29 12:42:49,425 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 6100, loss[loss=0.08288, simple_loss=0.1082, pruned_loss=0.01948, audio_tagging_loss=0.009318, over 15315.00 frames. ], tot_loss[loss=0.06488, simple_loss=0.08915, pruned_loss=0.01187, audio_tagging_loss=0.008428, over 3049070.60 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:42:51,930 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3968473.3333333335, ans=0.07 2023-11-29 12:42:53,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3968473.3333333335, ans=0.125 2023-11-29 12:42:54,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3968473.3333333335, ans=0.125 2023-11-29 12:43:19,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3968606.6666666665, ans=0.0 2023-11-29 12:43:22,806 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 595300 2023-11-29 12:43:27,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3968673.3333333335, ans=0.1 2023-11-29 12:43:35,524 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.840e+01 9.140e+01 9.748e+01 1.061e+02 1.283e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-29 12:43:37,256 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.84 vs. limit=15.0 2023-11-29 12:43:40,506 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 12:43:46,212 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3968740.0, ans=0.1 2023-11-29 12:43:52,123 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 6150, loss[loss=0.08733, simple_loss=0.1181, pruned_loss=0.01855, audio_tagging_loss=0.009738, over 15982.00 frames. ], tot_loss[loss=0.06459, simple_loss=0.08857, pruned_loss=0.01189, audio_tagging_loss=0.008411, over 3052487.01 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:44:06,766 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.03 vs. limit=6.0 2023-11-29 12:44:21,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3968940.0, ans=0.125 2023-11-29 12:44:24,726 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 595350 2023-11-29 12:44:34,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3969006.6666666665, ans=0.1 2023-11-29 12:44:53,585 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 6200, loss[loss=0.07251, simple_loss=0.1006, pruned_loss=0.01154, audio_tagging_loss=0.01069, over 14129.00 frames. ], tot_loss[loss=0.06389, simple_loss=0.08726, pruned_loss=0.01165, audio_tagging_loss=0.008617, over 3051499.04 frames. ], batch size: 54, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:45:05,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3969206.6666666665, ans=0.125 2023-11-29 12:45:13,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3969206.6666666665, ans=0.0 2023-11-29 12:45:19,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=3969273.3333333335, ans=10.0 2023-11-29 12:45:19,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3969273.3333333335, ans=0.0 2023-11-29 12:45:27,192 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 595400 2023-11-29 12:45:39,701 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.103e+01 9.032e+01 9.591e+01 1.015e+02 1.293e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-29 12:45:55,763 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 6250, loss[loss=0.06646, simple_loss=0.09805, pruned_loss=0.01149, audio_tagging_loss=0.005943, over 14960.00 frames. ], tot_loss[loss=0.06431, simple_loss=0.08783, pruned_loss=0.01175, audio_tagging_loss=0.008642, over 3051628.41 frames. ], batch size: 54, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:46:17,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3969540.0, ans=0.125 2023-11-29 12:46:28,441 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3969606.6666666665, ans=0.125 2023-11-29 12:46:29,239 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 595450 2023-11-29 12:46:36,585 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3969673.3333333335, ans=0.125 2023-11-29 12:46:52,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3969740.0, ans=0.0 2023-11-29 12:46:57,360 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 6300, loss[loss=0.08572, simple_loss=0.1217, pruned_loss=0.01731, audio_tagging_loss=0.00755, over 15335.00 frames. ], tot_loss[loss=0.06501, simple_loss=0.08886, pruned_loss=0.0119, audio_tagging_loss=0.008673, over 3053776.28 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:47:20,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3969873.3333333335, ans=0.125 2023-11-29 12:47:20,344 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3969873.3333333335, ans=0.2 2023-11-29 12:47:31,243 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 595500 2023-11-29 12:47:37,324 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3970006.6666666665, ans=0.1 2023-11-29 12:47:44,507 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.571e+01 8.915e+01 9.519e+01 1.033e+02 1.401e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-29 12:47:53,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3970073.3333333335, ans=0.0 2023-11-29 12:47:59,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3970140.0, ans=0.0 2023-11-29 12:47:59,900 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 6350, loss[loss=0.05856, simple_loss=0.07691, pruned_loss=0.01309, audio_tagging_loss=0.007015, over 14170.00 frames. ], tot_loss[loss=0.0649, simple_loss=0.08877, pruned_loss=0.01184, audio_tagging_loss=0.008684, over 3045802.36 frames. ], batch size: 55, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:48:16,599 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.99 vs. limit=15.0 2023-11-29 12:48:18,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3970206.6666666665, ans=0.1 2023-11-29 12:48:30,893 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3970273.3333333335, ans=0.125 2023-11-29 12:48:32,953 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 595550 2023-11-29 12:48:57,298 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.84 vs. limit=15.0 2023-11-29 12:49:01,860 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 6400, loss[loss=0.07704, simple_loss=0.1052, pruned_loss=0.01808, audio_tagging_loss=0.006383, over 15351.00 frames. ], tot_loss[loss=0.06511, simple_loss=0.08907, pruned_loss=0.01182, audio_tagging_loss=0.008757, over 3045935.80 frames. ], batch size: 60, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:49:10,423 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 12:49:13,128 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.83 vs. limit=15.0 2023-11-29 12:49:27,293 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.54 vs. limit=10.0 2023-11-29 12:49:29,254 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3970606.6666666665, ans=0.125 2023-11-29 12:49:32,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3970606.6666666665, ans=0.2 2023-11-29 12:49:35,473 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 595600 2023-11-29 12:49:50,259 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.542e+01 9.115e+01 9.887e+01 1.069e+02 1.285e+02, threshold=1.977e+02, percent-clipped=0.0 2023-11-29 12:49:53,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3970740.0, ans=0.125 2023-11-29 12:50:03,031 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 6450, loss[loss=0.07611, simple_loss=0.09724, pruned_loss=0.01777, audio_tagging_loss=0.009726, over 15437.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08987, pruned_loss=0.01186, audio_tagging_loss=0.008788, over 3040010.70 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:50:13,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3970806.6666666665, ans=0.125 2023-11-29 12:50:27,824 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.47 vs. limit=10.0 2023-11-29 12:50:34,095 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.95 vs. limit=15.0 2023-11-29 12:50:37,727 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 595650 2023-11-29 12:50:47,185 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3971006.6666666665, ans=0.125 2023-11-29 12:50:58,905 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3971073.3333333335, ans=0.2 2023-11-29 12:51:05,772 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 6500, loss[loss=0.05928, simple_loss=0.07393, pruned_loss=0.008897, audio_tagging_loss=0.01342, over 15334.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.08968, pruned_loss=0.01178, audio_tagging_loss=0.008785, over 3036701.70 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:51:05,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3971140.0, ans=0.0 2023-11-29 12:51:13,182 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.63 vs. limit=15.0 2023-11-29 12:51:30,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3971273.3333333335, ans=0.125 2023-11-29 12:51:39,824 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 595700 2023-11-29 12:51:41,580 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.20 vs. limit=15.0 2023-11-29 12:51:47,737 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.72 vs. limit=12.0 2023-11-29 12:51:52,237 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 12:51:54,124 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.470e+01 9.181e+01 9.755e+01 1.057e+02 1.258e+02, threshold=1.951e+02, percent-clipped=0.0 2023-11-29 12:52:07,926 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 6550, loss[loss=0.0656, simple_loss=0.08496, pruned_loss=0.01182, audio_tagging_loss=0.0113, over 14642.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08973, pruned_loss=0.01192, audio_tagging_loss=0.00866, over 3036926.55 frames. ], batch size: 54, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:52:10,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3971473.3333333335, ans=0.2 2023-11-29 12:52:18,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3971473.3333333335, ans=0.1 2023-11-29 12:52:19,862 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.47 vs. limit=15.0 2023-11-29 12:52:20,848 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3971540.0, ans=0.1 2023-11-29 12:52:30,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3971540.0, ans=0.125 2023-11-29 12:52:41,541 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 595750 2023-11-29 12:52:44,436 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.16 vs. limit=10.0 2023-11-29 12:52:47,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3971673.3333333335, ans=0.0 2023-11-29 12:52:50,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3971673.3333333335, ans=0.125 2023-11-29 12:52:59,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3971740.0, ans=0.0 2023-11-29 12:53:09,528 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 6600, loss[loss=0.06964, simple_loss=0.1, pruned_loss=0.01394, audio_tagging_loss=0.0057, over 14467.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.09029, pruned_loss=0.0119, audio_tagging_loss=0.00861, over 3038880.16 frames. ], batch size: 54, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:53:10,898 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3971806.6666666665, ans=0.5 2023-11-29 12:53:16,052 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3971806.6666666665, ans=0.0 2023-11-29 12:53:34,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3971940.0, ans=0.125 2023-11-29 12:53:40,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3971940.0, ans=0.0 2023-11-29 12:53:42,847 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 595800 2023-11-29 12:53:52,195 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 12:53:56,809 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3972006.6666666665, ans=0.0 2023-11-29 12:53:57,534 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.513e+01 8.899e+01 9.360e+01 1.006e+02 1.174e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-29 12:53:58,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3972073.3333333335, ans=0.07 2023-11-29 12:54:02,137 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 12:54:02,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3972073.3333333335, ans=10.0 2023-11-29 12:54:11,749 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 6650, loss[loss=0.09272, simple_loss=0.1408, pruned_loss=0.01709, audio_tagging_loss=0.005236, over 14665.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.09063, pruned_loss=0.01191, audio_tagging_loss=0.008444, over 3044689.65 frames. ], batch size: 53, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:54:20,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3972140.0, ans=0.1 2023-11-29 12:54:37,028 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3972273.3333333335, ans=0.0 2023-11-29 12:54:38,201 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 12:54:40,762 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.30 vs. limit=15.0 2023-11-29 12:54:44,990 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 595850 2023-11-29 12:54:49,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3972340.0, ans=0.0 2023-11-29 12:54:50,330 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3972340.0, ans=0.125 2023-11-29 12:55:13,851 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 6700, loss[loss=0.06846, simple_loss=0.1067, pruned_loss=0.008861, audio_tagging_loss=0.006236, over 15085.00 frames. ], tot_loss[loss=0.06485, simple_loss=0.08934, pruned_loss=0.01173, audio_tagging_loss=0.008451, over 3042676.24 frames. ], batch size: 54, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:55:14,813 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.39 vs. limit=5.0 2023-11-29 12:55:25,351 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.65 vs. limit=22.5 2023-11-29 12:55:29,919 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.18 vs. limit=12.0 2023-11-29 12:55:43,508 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.84 vs. limit=15.0 2023-11-29 12:55:46,566 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 595900 2023-11-29 12:55:48,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3972606.6666666665, ans=0.125 2023-11-29 12:55:49,864 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.38 vs. limit=8.0 2023-11-29 12:55:51,834 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.54 vs. limit=22.5 2023-11-29 12:55:58,743 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3972673.3333333335, ans=0.2 2023-11-29 12:56:01,562 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.562e+01 9.149e+01 9.695e+01 1.030e+02 1.289e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-29 12:56:08,628 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3972740.0, ans=0.2 2023-11-29 12:56:13,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3972740.0, ans=0.025 2023-11-29 12:56:15,574 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 6750, loss[loss=0.07118, simple_loss=0.09377, pruned_loss=0.01382, audio_tagging_loss=0.01048, over 15765.00 frames. ], tot_loss[loss=0.06473, simple_loss=0.08918, pruned_loss=0.01165, audio_tagging_loss=0.00849, over 3038740.76 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:56:17,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3972806.6666666665, ans=0.125 2023-11-29 12:56:27,160 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 12:56:44,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3972940.0, ans=0.0 2023-11-29 12:56:49,837 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 595950 2023-11-29 12:57:00,214 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.92 vs. limit=12.0 2023-11-29 12:57:03,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3973006.6666666665, ans=0.0 2023-11-29 12:57:05,810 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.75 vs. limit=12.0 2023-11-29 12:57:10,969 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3973073.3333333335, ans=0.2 2023-11-29 12:57:18,164 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 6800, loss[loss=0.0645, simple_loss=0.08758, pruned_loss=0.01209, audio_tagging_loss=0.008616, over 15518.00 frames. ], tot_loss[loss=0.06458, simple_loss=0.08878, pruned_loss=0.01166, audio_tagging_loss=0.008532, over 3039039.90 frames. ], batch size: 60, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:57:25,977 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.08 vs. limit=15.0 2023-11-29 12:57:30,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3973206.6666666665, ans=0.125 2023-11-29 12:57:36,779 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.01 vs. limit=12.0 2023-11-29 12:57:51,587 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 596000 2023-11-29 12:57:57,017 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3973340.0, ans=0.1 2023-11-29 12:58:03,371 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3973340.0, ans=0.2 2023-11-29 12:58:09,399 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.052e+01 8.975e+01 9.560e+01 1.019e+02 1.968e+02, threshold=1.912e+02, percent-clipped=1.0 2023-11-29 12:58:22,179 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 6850, loss[loss=0.08676, simple_loss=0.1234, pruned_loss=0.01757, audio_tagging_loss=0.007474, over 15505.00 frames. ], tot_loss[loss=0.06367, simple_loss=0.08747, pruned_loss=0.0114, audio_tagging_loss=0.008535, over 3039335.75 frames. ], batch size: 54, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:58:33,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3973473.3333333335, ans=0.125 2023-11-29 12:58:43,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3973540.0, ans=0.0 2023-11-29 12:58:49,578 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.65 vs. limit=22.5 2023-11-29 12:58:56,553 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 596050 2023-11-29 12:59:13,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3973740.0, ans=0.125 2023-11-29 12:59:20,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3973740.0, ans=0.1 2023-11-29 12:59:21,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3973740.0, ans=0.125 2023-11-29 12:59:24,847 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 6900, loss[loss=0.07311, simple_loss=0.1017, pruned_loss=0.01478, audio_tagging_loss=0.007481, over 15067.00 frames. ], tot_loss[loss=0.06441, simple_loss=0.08867, pruned_loss=0.01162, audio_tagging_loss=0.008448, over 3045382.11 frames. ], batch size: 54, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:59:25,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3973806.6666666665, ans=0.125 2023-11-29 12:59:37,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3973873.3333333335, ans=0.125 2023-11-29 12:59:57,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3973940.0, ans=0.125 2023-11-29 12:59:57,999 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 596100 2023-11-29 13:00:13,495 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.455e+01 8.949e+01 9.796e+01 1.035e+02 1.230e+02, threshold=1.959e+02, percent-clipped=0.0 2023-11-29 13:00:13,558 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 13:00:25,775 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 6950, loss[loss=0.07433, simple_loss=0.1017, pruned_loss=0.01426, audio_tagging_loss=0.009235, over 14761.00 frames. ], tot_loss[loss=0.06422, simple_loss=0.08864, pruned_loss=0.01153, audio_tagging_loss=0.008366, over 3039760.93 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:00:35,169 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3974140.0, ans=0.125 2023-11-29 13:00:37,554 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3974206.6666666665, ans=0.0 2023-11-29 13:00:42,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3974206.6666666665, ans=0.0 2023-11-29 13:00:59,213 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 596150 2023-11-29 13:01:00,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3974273.3333333335, ans=0.125 2023-11-29 13:01:06,764 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.02 vs. limit=15.0 2023-11-29 13:01:11,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3974340.0, ans=0.125 2023-11-29 13:01:13,083 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3974340.0, ans=0.95 2023-11-29 13:01:14,810 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 13:01:17,400 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.47 vs. limit=6.0 2023-11-29 13:01:24,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3974406.6666666665, ans=0.1 2023-11-29 13:01:27,349 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 7000, loss[loss=0.0667, simple_loss=0.09648, pruned_loss=0.0128, audio_tagging_loss=0.005665, over 14532.00 frames. ], tot_loss[loss=0.06348, simple_loss=0.08753, pruned_loss=0.01132, audio_tagging_loss=0.008395, over 3041711.92 frames. ], batch size: 54, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:01:34,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3974473.3333333335, ans=0.0 2023-11-29 13:02:01,364 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 596200 2023-11-29 13:02:16,536 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.531e+01 8.861e+01 9.690e+01 1.060e+02 1.703e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-29 13:02:17,387 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.04 vs. limit=15.0 2023-11-29 13:02:29,631 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 7050, loss[loss=0.07724, simple_loss=0.1179, pruned_loss=0.01146, audio_tagging_loss=0.006821, over 16278.00 frames. ], tot_loss[loss=0.06465, simple_loss=0.08934, pruned_loss=0.01153, audio_tagging_loss=0.008443, over 3051986.49 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:02:30,588 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.93 vs. limit=15.0 2023-11-29 13:02:48,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3974873.3333333335, ans=0.0 2023-11-29 13:03:02,748 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 596250 2023-11-29 13:03:03,270 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.05 vs. limit=15.0 2023-11-29 13:03:04,092 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3974940.0, ans=0.125 2023-11-29 13:03:10,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3975006.6666666665, ans=0.125 2023-11-29 13:03:18,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3975073.3333333335, ans=0.0 2023-11-29 13:03:31,616 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 7100, loss[loss=0.06775, simple_loss=0.08727, pruned_loss=0.01762, audio_tagging_loss=0.006485, over 15206.00 frames. ], tot_loss[loss=0.06457, simple_loss=0.08934, pruned_loss=0.01151, audio_tagging_loss=0.008388, over 3060303.03 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:03:34,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3975140.0, ans=0.0 2023-11-29 13:03:38,467 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3975140.0, ans=0.125 2023-11-29 13:04:00,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3975273.3333333335, ans=0.0 2023-11-29 13:04:05,131 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 596300 2023-11-29 13:04:20,811 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.508e+01 9.433e+01 1.002e+02 1.073e+02 1.406e+02, threshold=2.003e+02, percent-clipped=0.0 2023-11-29 13:04:31,077 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.81 vs. limit=15.0 2023-11-29 13:04:32,943 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 7150, loss[loss=0.05718, simple_loss=0.07653, pruned_loss=0.01021, audio_tagging_loss=0.008708, over 14547.00 frames. ], tot_loss[loss=0.06473, simple_loss=0.0893, pruned_loss=0.01165, audio_tagging_loss=0.008428, over 3059323.31 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:05:06,673 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 596350 2023-11-29 13:05:12,759 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.72 vs. limit=15.0 2023-11-29 13:05:34,674 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 7200, loss[loss=0.06758, simple_loss=0.09585, pruned_loss=0.01039, audio_tagging_loss=0.009269, over 15884.00 frames. ], tot_loss[loss=0.06481, simple_loss=0.08919, pruned_loss=0.01163, audio_tagging_loss=0.008589, over 3059266.70 frames. ], batch size: 59, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 13:05:39,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3975806.6666666665, ans=0.07 2023-11-29 13:06:08,425 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 596400 2023-11-29 13:06:12,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff2.min_abs, batch_count=3976006.6666666665, ans=0.1 2023-11-29 13:06:24,316 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.018e+01 9.297e+01 9.851e+01 1.057e+02 1.501e+02, threshold=1.970e+02, percent-clipped=0.0 2023-11-29 13:06:36,942 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 7250, loss[loss=0.08171, simple_loss=0.117, pruned_loss=0.01583, audio_tagging_loss=0.007386, over 15373.00 frames. ], tot_loss[loss=0.06471, simple_loss=0.08901, pruned_loss=0.01154, audio_tagging_loss=0.008658, over 3053368.58 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:06:39,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3976140.0, ans=0.0 2023-11-29 13:06:52,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3976206.6666666665, ans=0.0 2023-11-29 13:07:09,946 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 596450 2023-11-29 13:07:21,654 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=3976340.0, ans=6.0 2023-11-29 13:07:38,456 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 7300, loss[loss=0.06087, simple_loss=0.08551, pruned_loss=0.01011, audio_tagging_loss=0.008001, over 14748.00 frames. ], tot_loss[loss=0.06511, simple_loss=0.08996, pruned_loss=0.01161, audio_tagging_loss=0.008521, over 3044095.42 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:08:00,860 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.91 vs. limit=22.5 2023-11-29 13:08:11,911 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 596500 2023-11-29 13:08:28,806 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.807e+01 9.133e+01 9.755e+01 1.029e+02 1.413e+02, threshold=1.951e+02, percent-clipped=0.0 2023-11-29 13:08:32,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3976740.0, ans=0.0 2023-11-29 13:08:39,162 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 7350, loss[loss=0.06539, simple_loss=0.0912, pruned_loss=0.0118, audio_tagging_loss=0.007983, over 14925.00 frames. ], tot_loss[loss=0.06474, simple_loss=0.08951, pruned_loss=0.01157, audio_tagging_loss=0.008416, over 3045996.63 frames. ], batch size: 54, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:09:06,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3976940.0, ans=0.125 2023-11-29 13:09:13,206 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 596550 2023-11-29 13:09:15,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3977006.6666666665, ans=0.125 2023-11-29 13:09:26,496 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.13 vs. limit=22.5 2023-11-29 13:09:36,855 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.43 vs. limit=22.5 2023-11-29 13:09:37,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3977073.3333333335, ans=0.125 2023-11-29 13:09:39,981 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.95 vs. limit=6.0 2023-11-29 13:09:40,653 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 7400, loss[loss=0.08218, simple_loss=0.1058, pruned_loss=0.01755, audio_tagging_loss=0.01174, over 15403.00 frames. ], tot_loss[loss=0.06476, simple_loss=0.08925, pruned_loss=0.01169, audio_tagging_loss=0.008444, over 3043223.32 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:09:43,181 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.96 vs. limit=15.0 2023-11-29 13:10:14,028 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 596600 2023-11-29 13:10:16,430 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.36 vs. limit=10.0 2023-11-29 13:10:21,036 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3977340.0, ans=0.125 2023-11-29 13:10:31,697 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.135e+01 9.143e+01 9.729e+01 1.063e+02 1.656e+02, threshold=1.946e+02, percent-clipped=0.0 2023-11-29 13:10:43,574 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 7450, loss[loss=0.06113, simple_loss=0.078, pruned_loss=0.01216, audio_tagging_loss=0.009973, over 16162.00 frames. ], tot_loss[loss=0.06435, simple_loss=0.08838, pruned_loss=0.01168, audio_tagging_loss=0.008482, over 3036379.19 frames. ], batch size: 64, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:10:49,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3977473.3333333335, ans=0.0 2023-11-29 13:11:16,234 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 596650 2023-11-29 13:11:16,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3977606.6666666665, ans=0.125 2023-11-29 13:11:44,409 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 7500, loss[loss=0.06182, simple_loss=0.07897, pruned_loss=0.01053, audio_tagging_loss=0.01181, over 15023.00 frames. ], tot_loss[loss=0.06428, simple_loss=0.08824, pruned_loss=0.01176, audio_tagging_loss=0.008401, over 3037521.72 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:11:51,533 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=3977806.6666666665, ans=22.5 2023-11-29 13:12:06,950 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.88 vs. limit=15.0 2023-11-29 13:12:18,393 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 596700 2023-11-29 13:12:34,573 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.003e+01 9.233e+01 9.870e+01 1.058e+02 1.396e+02, threshold=1.974e+02, percent-clipped=0.0 2023-11-29 13:12:41,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3978073.3333333335, ans=0.125 2023-11-29 13:12:45,803 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 7550, loss[loss=0.05163, simple_loss=0.07282, pruned_loss=0.006769, audio_tagging_loss=0.00845, over 16130.00 frames. ], tot_loss[loss=0.06456, simple_loss=0.08872, pruned_loss=0.01184, audio_tagging_loss=0.00836, over 3036181.51 frames. ], batch size: 61, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:12:50,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3978140.0, ans=0.125 2023-11-29 13:12:52,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3978140.0, ans=0.2 2023-11-29 13:12:54,701 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=3978140.0, ans=0.05 2023-11-29 13:13:18,393 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 596750 2023-11-29 13:13:21,039 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3978340.0, ans=0.125 2023-11-29 13:13:21,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3978340.0, ans=0.125 2023-11-29 13:13:25,292 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.00 vs. limit=22.5 2023-11-29 13:13:28,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3978340.0, ans=0.125 2023-11-29 13:13:29,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3978340.0, ans=0.125 2023-11-29 13:13:42,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3978406.6666666665, ans=0.1 2023-11-29 13:13:43,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3978406.6666666665, ans=0.1 2023-11-29 13:13:46,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3978473.3333333335, ans=0.0 2023-11-29 13:13:48,025 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 7600, loss[loss=0.07094, simple_loss=0.09612, pruned_loss=0.01552, audio_tagging_loss=0.007355, over 15174.00 frames. ], tot_loss[loss=0.06426, simple_loss=0.08833, pruned_loss=0.0117, audio_tagging_loss=0.008393, over 3040839.96 frames. ], batch size: 60, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 13:14:01,036 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3978540.0, ans=0.0 2023-11-29 13:14:13,145 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3978606.6666666665, ans=0.2 2023-11-29 13:14:17,811 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 13:14:19,883 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 596800 2023-11-29 13:14:22,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3978673.3333333335, ans=0.05 2023-11-29 13:14:22,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3978673.3333333335, ans=0.0 2023-11-29 13:14:37,840 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.643e+01 8.734e+01 9.698e+01 1.076e+02 1.664e+02, threshold=1.940e+02, percent-clipped=0.0 2023-11-29 13:14:48,319 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 7650, loss[loss=0.0677, simple_loss=0.09064, pruned_loss=0.01589, audio_tagging_loss=0.006494, over 14902.00 frames. ], tot_loss[loss=0.06461, simple_loss=0.08909, pruned_loss=0.01177, audio_tagging_loss=0.008302, over 3044868.27 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 13:15:21,927 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 596850 2023-11-29 13:15:22,374 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.26 vs. limit=12.0 2023-11-29 13:15:27,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3979006.6666666665, ans=0.1 2023-11-29 13:15:50,310 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 7700, loss[loss=0.07467, simple_loss=0.09346, pruned_loss=0.01776, audio_tagging_loss=0.01018, over 14095.00 frames. ], tot_loss[loss=0.06469, simple_loss=0.08935, pruned_loss=0.01173, audio_tagging_loss=0.008288, over 3049249.42 frames. ], batch size: 53, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 13:15:53,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3979140.0, ans=0.2 2023-11-29 13:16:17,495 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.10 vs. limit=6.0 2023-11-29 13:16:24,153 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 596900 2023-11-29 13:16:38,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3979406.6666666665, ans=0.125 2023-11-29 13:16:40,893 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.740e+01 9.160e+01 9.812e+01 1.042e+02 1.331e+02, threshold=1.962e+02, percent-clipped=0.0 2023-11-29 13:16:44,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3979406.6666666665, ans=0.125 2023-11-29 13:16:46,494 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3979406.6666666665, ans=0.2 2023-11-29 13:16:52,039 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 7750, loss[loss=0.08483, simple_loss=0.1229, pruned_loss=0.01809, audio_tagging_loss=0.005265, over 16157.00 frames. ], tot_loss[loss=0.06423, simple_loss=0.08871, pruned_loss=0.01156, audio_tagging_loss=0.008314, over 3050412.15 frames. ], batch size: 54, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 13:17:25,952 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 596950 2023-11-29 13:17:27,321 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 13:17:54,653 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 7800, loss[loss=0.06509, simple_loss=0.07793, pruned_loss=0.01472, audio_tagging_loss=0.01141, over 13779.00 frames. ], tot_loss[loss=0.06471, simple_loss=0.08935, pruned_loss=0.01171, audio_tagging_loss=0.008328, over 3042242.96 frames. ], batch size: 53, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 13:17:54,994 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3979806.6666666665, ans=0.125 2023-11-29 13:17:59,509 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3979806.6666666665, ans=0.0 2023-11-29 13:18:11,004 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-29 13:18:17,102 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.20 vs. limit=15.0 2023-11-29 13:18:26,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3979940.0, ans=0.2 2023-11-29 13:18:27,688 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 597000 2023-11-29 13:18:27,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3979940.0, ans=0.125 2023-11-29 13:18:36,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3980006.6666666665, ans=0.1 2023-11-29 13:18:46,268 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.796e+01 9.044e+01 9.654e+01 1.045e+02 1.431e+02, threshold=1.931e+02, percent-clipped=0.0 2023-11-29 13:18:47,832 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3980073.3333333335, ans=0.125 2023-11-29 13:18:57,701 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 7850, loss[loss=0.05418, simple_loss=0.07711, pruned_loss=0.009589, audio_tagging_loss=0.006043, over 15218.00 frames. ], tot_loss[loss=0.06463, simple_loss=0.08921, pruned_loss=0.01166, audio_tagging_loss=0.008375, over 3037447.21 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 13:19:01,596 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3980140.0, ans=0.125 2023-11-29 13:19:09,056 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 13:19:31,698 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 597050 2023-11-29 13:19:44,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3980340.0, ans=0.0 2023-11-29 13:19:47,282 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.24 vs. limit=6.0 2023-11-29 13:19:50,415 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.98 vs. limit=15.0 2023-11-29 13:19:57,979 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.99 vs. limit=15.0 2023-11-29 13:19:59,733 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 7900, loss[loss=0.06615, simple_loss=0.08421, pruned_loss=0.01425, audio_tagging_loss=0.009788, over 15682.00 frames. ], tot_loss[loss=0.0647, simple_loss=0.0892, pruned_loss=0.01163, audio_tagging_loss=0.00847, over 3036617.16 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:20:04,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3980473.3333333335, ans=0.125 2023-11-29 13:20:14,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3980540.0, ans=0.0 2023-11-29 13:20:26,170 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.55 vs. limit=15.0 2023-11-29 13:20:32,900 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 597100 2023-11-29 13:20:41,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3980673.3333333335, ans=0.2 2023-11-29 13:20:41,309 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=3980673.3333333335, ans=22.5 2023-11-29 13:20:52,320 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.586e+01 9.084e+01 9.995e+01 1.068e+02 1.283e+02, threshold=1.999e+02, percent-clipped=0.0 2023-11-29 13:20:54,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3980740.0, ans=0.125 2023-11-29 13:20:57,491 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.17 vs. limit=15.0 2023-11-29 13:21:01,672 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 7950, loss[loss=0.05995, simple_loss=0.0861, pruned_loss=0.009355, audio_tagging_loss=0.007541, over 15021.00 frames. ], tot_loss[loss=0.06477, simple_loss=0.08893, pruned_loss=0.01167, audio_tagging_loss=0.008631, over 3041727.28 frames. ], batch size: 54, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:21:20,009 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 13:21:35,898 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 597150 2023-11-29 13:21:44,183 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.69 vs. limit=15.0 2023-11-29 13:22:03,618 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3981140.0, ans=0.1 2023-11-29 13:22:04,454 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 8000, loss[loss=0.05757, simple_loss=0.07922, pruned_loss=0.007422, audio_tagging_loss=0.01054, over 15288.00 frames. ], tot_loss[loss=0.06482, simple_loss=0.08905, pruned_loss=0.01164, audio_tagging_loss=0.008654, over 3037137.68 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 13:22:06,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3981140.0, ans=0.0 2023-11-29 13:22:16,476 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.98 vs. limit=15.0 2023-11-29 13:22:29,006 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.56 vs. limit=15.0 2023-11-29 13:22:36,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3981273.3333333335, ans=0.0 2023-11-29 13:22:37,483 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 597200 2023-11-29 13:22:37,768 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3981273.3333333335, ans=0.125 2023-11-29 13:22:40,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3981340.0, ans=0.125 2023-11-29 13:22:57,933 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.949e+01 8.831e+01 9.452e+01 1.032e+02 1.267e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-29 13:23:02,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3981406.6666666665, ans=0.2 2023-11-29 13:23:06,846 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 8050, loss[loss=0.0644, simple_loss=0.09027, pruned_loss=0.01258, audio_tagging_loss=0.006692, over 17094.00 frames. ], tot_loss[loss=0.06429, simple_loss=0.08802, pruned_loss=0.01153, audio_tagging_loss=0.008751, over 3032672.22 frames. ], batch size: 65, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:23:11,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3981473.3333333335, ans=0.025 2023-11-29 13:23:28,129 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.27 vs. limit=15.0 2023-11-29 13:23:37,543 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.03 vs. limit=15.0 2023-11-29 13:23:40,534 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 597250 2023-11-29 13:23:49,090 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3981673.3333333335, ans=0.2 2023-11-29 13:23:54,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3981673.3333333335, ans=0.0 2023-11-29 13:24:01,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3981740.0, ans=0.2 2023-11-29 13:24:08,351 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 8100, loss[loss=0.07574, simple_loss=0.09806, pruned_loss=0.01563, audio_tagging_loss=0.01108, over 15789.00 frames. ], tot_loss[loss=0.06436, simple_loss=0.08813, pruned_loss=0.01157, audio_tagging_loss=0.008723, over 3031108.22 frames. ], batch size: 59, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:24:10,952 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3981806.6666666665, ans=0.0 2023-11-29 13:24:20,515 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.22 vs. limit=15.0 2023-11-29 13:24:41,066 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 597300 2023-11-29 13:24:48,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3982006.6666666665, ans=0.0 2023-11-29 13:24:48,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3982006.6666666665, ans=0.125 2023-11-29 13:24:55,683 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3982073.3333333335, ans=0.0 2023-11-29 13:24:59,882 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.898e+01 9.024e+01 9.571e+01 1.071e+02 1.276e+02, threshold=1.914e+02, percent-clipped=0.0 2023-11-29 13:25:08,620 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 8150, loss[loss=0.0781, simple_loss=0.1072, pruned_loss=0.01616, audio_tagging_loss=0.008331, over 15418.00 frames. ], tot_loss[loss=0.06413, simple_loss=0.08778, pruned_loss=0.01168, audio_tagging_loss=0.008554, over 3032191.20 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:25:14,036 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3982140.0, ans=10.0 2023-11-29 13:25:17,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3982140.0, ans=0.0 2023-11-29 13:25:17,976 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.16 vs. limit=22.5 2023-11-29 13:25:18,809 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3982140.0, ans=0.0 2023-11-29 13:25:30,209 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.52 vs. limit=22.5 2023-11-29 13:25:38,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3982273.3333333335, ans=0.125 2023-11-29 13:25:41,904 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 597350 2023-11-29 13:25:59,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3982406.6666666665, ans=0.125 2023-11-29 13:26:02,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3982406.6666666665, ans=0.2 2023-11-29 13:26:07,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3982406.6666666665, ans=0.125 2023-11-29 13:26:10,388 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 8200, loss[loss=0.07398, simple_loss=0.1135, pruned_loss=0.01196, audio_tagging_loss=0.005283, over 15495.00 frames. ], tot_loss[loss=0.06458, simple_loss=0.08865, pruned_loss=0.0118, audio_tagging_loss=0.00846, over 3029040.20 frames. ], batch size: 54, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:26:13,947 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 13:26:17,019 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.39 vs. limit=15.0 2023-11-29 13:26:30,710 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.19 vs. limit=15.0 2023-11-29 13:26:32,561 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3982540.0, ans=0.2 2023-11-29 13:26:38,675 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.66 vs. limit=12.0 2023-11-29 13:26:43,606 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 597400 2023-11-29 13:26:48,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3982673.3333333335, ans=0.125 2023-11-29 13:26:50,566 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3982673.3333333335, ans=0.2 2023-11-29 13:27:03,617 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.841e+01 9.247e+01 9.743e+01 1.041e+02 1.327e+02, threshold=1.949e+02, percent-clipped=0.0 2023-11-29 13:27:07,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3982740.0, ans=0.1 2023-11-29 13:27:12,550 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 8250, loss[loss=0.07224, simple_loss=0.1063, pruned_loss=0.01252, audio_tagging_loss=0.006586, over 14982.00 frames. ], tot_loss[loss=0.06458, simple_loss=0.08882, pruned_loss=0.01178, audio_tagging_loss=0.008393, over 3034280.20 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:27:15,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3982806.6666666665, ans=0.04949747468305833 2023-11-29 13:27:22,196 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3982806.6666666665, ans=0.5 2023-11-29 13:27:45,890 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 597450 2023-11-29 13:28:12,929 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 8300, loss[loss=0.05673, simple_loss=0.07299, pruned_loss=0.008499, audio_tagging_loss=0.01174, over 14409.00 frames. ], tot_loss[loss=0.06442, simple_loss=0.08854, pruned_loss=0.01175, audio_tagging_loss=0.008402, over 3041790.90 frames. ], batch size: 55, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:28:18,505 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3983140.0, ans=0.125 2023-11-29 13:28:20,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3983140.0, ans=0.125 2023-11-29 13:28:28,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3983206.6666666665, ans=0.1 2023-11-29 13:28:40,583 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3983273.3333333335, ans=0.125 2023-11-29 13:28:42,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3983273.3333333335, ans=0.2 2023-11-29 13:28:46,621 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 597500 2023-11-29 13:28:56,207 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3983340.0, ans=0.125 2023-11-29 13:29:00,777 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3983406.6666666665, ans=0.2 2023-11-29 13:29:05,738 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.020e+01 9.254e+01 9.916e+01 1.057e+02 1.310e+02, threshold=1.983e+02, percent-clipped=0.0 2023-11-29 13:29:14,569 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 8350, loss[loss=0.04865, simple_loss=0.06603, pruned_loss=0.008111, audio_tagging_loss=0.007523, over 14940.00 frames. ], tot_loss[loss=0.06462, simple_loss=0.08892, pruned_loss=0.01182, audio_tagging_loss=0.008341, over 3049149.16 frames. ], batch size: 59, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:29:40,836 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.71 vs. limit=15.0 2023-11-29 13:29:47,134 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 597550 2023-11-29 13:29:50,764 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.11 vs. limit=15.0 2023-11-29 13:29:54,998 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3983673.3333333335, ans=0.125 2023-11-29 13:30:06,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3983740.0, ans=0.125 2023-11-29 13:30:16,383 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 8400, loss[loss=0.08213, simple_loss=0.1199, pruned_loss=0.01577, audio_tagging_loss=0.006404, over 15543.00 frames. ], tot_loss[loss=0.06423, simple_loss=0.08852, pruned_loss=0.01165, audio_tagging_loss=0.008321, over 3042750.44 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 13:30:19,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3983806.6666666665, ans=0.0 2023-11-29 13:30:49,688 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 597600 2023-11-29 13:30:50,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3983940.0, ans=0.125 2023-11-29 13:31:10,342 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.822e+01 9.074e+01 9.912e+01 1.068e+02 1.273e+02, threshold=1.982e+02, percent-clipped=0.0 2023-11-29 13:31:17,382 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 8450, loss[loss=0.06508, simple_loss=0.0809, pruned_loss=0.01343, audio_tagging_loss=0.0112, over 15125.00 frames. ], tot_loss[loss=0.06419, simple_loss=0.08821, pruned_loss=0.01168, audio_tagging_loss=0.008405, over 3040331.27 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:31:30,437 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.69 vs. limit=15.0 2023-11-29 13:31:46,381 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3984273.3333333335, ans=0.125 2023-11-29 13:31:51,367 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 597650 2023-11-29 13:31:56,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3984340.0, ans=10.0 2023-11-29 13:32:01,138 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3984340.0, ans=0.1 2023-11-29 13:32:02,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3984340.0, ans=0.1 2023-11-29 13:32:18,953 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 8500, loss[loss=0.05735, simple_loss=0.07807, pruned_loss=0.01006, audio_tagging_loss=0.008255, over 14791.00 frames. ], tot_loss[loss=0.0643, simple_loss=0.08863, pruned_loss=0.01163, audio_tagging_loss=0.008352, over 3042111.88 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:32:34,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3984540.0, ans=0.1 2023-11-29 13:32:53,081 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 597700 2023-11-29 13:33:03,263 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3984673.3333333335, ans=0.125 2023-11-29 13:33:13,726 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.159e+01 8.994e+01 9.750e+01 1.041e+02 1.321e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-29 13:33:16,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3984740.0, ans=0.2 2023-11-29 13:33:21,350 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 8550, loss[loss=0.06019, simple_loss=0.07696, pruned_loss=0.01285, audio_tagging_loss=0.008864, over 15299.00 frames. ], tot_loss[loss=0.06427, simple_loss=0.08822, pruned_loss=0.01176, audio_tagging_loss=0.008407, over 3044815.70 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:33:37,860 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3984873.3333333335, ans=0.0 2023-11-29 13:33:48,953 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3984940.0, ans=0.125 2023-11-29 13:33:51,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3984940.0, ans=0.125 2023-11-29 13:33:54,579 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 597750 2023-11-29 13:34:02,543 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3985006.6666666665, ans=0.125 2023-11-29 13:34:22,932 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 8600, loss[loss=0.05714, simple_loss=0.08148, pruned_loss=0.007551, audio_tagging_loss=0.008847, over 14646.00 frames. ], tot_loss[loss=0.06399, simple_loss=0.08793, pruned_loss=0.01151, audio_tagging_loss=0.008523, over 3046661.72 frames. ], batch size: 54, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:34:47,539 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.81 vs. limit=22.5 2023-11-29 13:34:49,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3985273.3333333335, ans=0.125 2023-11-29 13:34:49,360 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 13:34:53,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3985273.3333333335, ans=0.125 2023-11-29 13:34:57,489 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 597800 2023-11-29 13:35:02,610 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 13:35:18,161 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.750e+01 8.848e+01 9.575e+01 1.022e+02 1.291e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-29 13:35:18,379 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3985406.6666666665, ans=0.125 2023-11-29 13:35:25,290 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 8650, loss[loss=0.06672, simple_loss=0.08938, pruned_loss=0.01297, audio_tagging_loss=0.009063, over 15785.00 frames. ], tot_loss[loss=0.06417, simple_loss=0.08835, pruned_loss=0.01152, audio_tagging_loss=0.00848, over 3049516.90 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:35:34,437 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3985473.3333333335, ans=0.125 2023-11-29 13:35:58,893 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 597850 2023-11-29 13:36:10,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3985673.3333333335, ans=0.125 2023-11-29 13:36:27,085 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 8700, loss[loss=0.07589, simple_loss=0.1091, pruned_loss=0.01404, audio_tagging_loss=0.007299, over 14721.00 frames. ], tot_loss[loss=0.06402, simple_loss=0.0879, pruned_loss=0.01147, audio_tagging_loss=0.008608, over 3046375.32 frames. ], batch size: 55, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:36:59,847 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 597900 2023-11-29 13:37:02,687 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.92 vs. limit=15.0 2023-11-29 13:37:15,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3986073.3333333335, ans=0.0 2023-11-29 13:37:21,272 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.738e+01 9.317e+01 9.883e+01 1.072e+02 1.210e+02, threshold=1.977e+02, percent-clipped=0.0 2023-11-29 13:37:21,738 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3986073.3333333335, ans=0.1 2023-11-29 13:37:28,345 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 8750, loss[loss=0.07654, simple_loss=0.1026, pruned_loss=0.01611, audio_tagging_loss=0.00914, over 14086.00 frames. ], tot_loss[loss=0.06448, simple_loss=0.08843, pruned_loss=0.01162, audio_tagging_loss=0.008651, over 3051002.23 frames. ], batch size: 54, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:37:30,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3986140.0, ans=0.125 2023-11-29 13:37:31,470 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.00 vs. limit=15.0 2023-11-29 13:37:38,676 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3986140.0, ans=0.05 2023-11-29 13:37:51,411 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3986273.3333333335, ans=0.125 2023-11-29 13:38:01,155 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 597950 2023-11-29 13:38:19,917 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3986406.6666666665, ans=0.125 2023-11-29 13:38:23,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3986406.6666666665, ans=0.0 2023-11-29 13:38:29,584 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 8800, loss[loss=0.05406, simple_loss=0.07339, pruned_loss=0.005309, audio_tagging_loss=0.01206, over 14831.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.09009, pruned_loss=0.01188, audio_tagging_loss=0.008565, over 3058617.46 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 13:38:33,412 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 13:38:51,456 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.78 vs. limit=15.0 2023-11-29 13:38:53,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3986606.6666666665, ans=0.125 2023-11-29 13:38:53,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3986606.6666666665, ans=0.125 2023-11-29 13:39:00,285 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.92 vs. limit=15.0 2023-11-29 13:39:01,227 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.35 vs. limit=22.5 2023-11-29 13:39:02,950 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 598000 2023-11-29 13:39:12,286 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3986673.3333333335, ans=0.0 2023-11-29 13:39:14,947 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.89 vs. limit=15.0 2023-11-29 13:39:23,607 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.845e+01 9.465e+01 1.025e+02 1.121e+02 1.304e+02, threshold=2.051e+02, percent-clipped=0.0 2023-11-29 13:39:27,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3986740.0, ans=0.2 2023-11-29 13:39:31,213 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 8850, loss[loss=0.05498, simple_loss=0.07745, pruned_loss=0.008457, audio_tagging_loss=0.007802, over 15695.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.09025, pruned_loss=0.01178, audio_tagging_loss=0.008609, over 3056379.59 frames. ], batch size: 59, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 13:39:33,758 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3986806.6666666665, ans=0.1 2023-11-29 13:39:35,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3986806.6666666665, ans=0.125 2023-11-29 13:39:40,389 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.21 vs. limit=10.0 2023-11-29 13:39:46,143 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 13:39:54,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3986940.0, ans=0.2 2023-11-29 13:40:04,294 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 598050 2023-11-29 13:40:26,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3987073.3333333335, ans=0.0 2023-11-29 13:40:27,815 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 13:40:32,893 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 8900, loss[loss=0.07531, simple_loss=0.1105, pruned_loss=0.01463, audio_tagging_loss=0.005432, over 15494.00 frames. ], tot_loss[loss=0.06506, simple_loss=0.08989, pruned_loss=0.01162, audio_tagging_loss=0.00849, over 3055868.95 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 13:40:34,624 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.62 vs. limit=6.0 2023-11-29 13:40:43,061 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.76 vs. limit=10.0 2023-11-29 13:40:46,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3987206.6666666665, ans=0.2 2023-11-29 13:40:52,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=3987206.6666666665, ans=22.5 2023-11-29 13:40:57,158 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3987273.3333333335, ans=0.0 2023-11-29 13:41:02,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=3987273.3333333335, ans=0.05 2023-11-29 13:41:05,597 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 598100 2023-11-29 13:41:07,919 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3987340.0, ans=0.1 2023-11-29 13:41:22,685 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.14 vs. limit=10.0 2023-11-29 13:41:26,647 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.226e+01 9.187e+01 9.849e+01 1.048e+02 1.202e+02, threshold=1.970e+02, percent-clipped=0.0 2023-11-29 13:41:34,334 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 8950, loss[loss=0.06842, simple_loss=0.09265, pruned_loss=0.01455, audio_tagging_loss=0.00755, over 15330.00 frames. ], tot_loss[loss=0.06498, simple_loss=0.08969, pruned_loss=0.01169, audio_tagging_loss=0.008448, over 3054369.24 frames. ], batch size: 55, lr: 1.34e-03, grad_scale: 32.0 2023-11-29 13:41:36,837 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3987473.3333333335, ans=0.125 2023-11-29 13:41:59,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3987606.6666666665, ans=0.1 2023-11-29 13:42:02,473 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3987606.6666666665, ans=0.1 2023-11-29 13:42:04,741 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-29 13:42:07,435 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 598150 2023-11-29 13:42:08,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3987606.6666666665, ans=0.125 2023-11-29 13:42:13,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3987673.3333333335, ans=0.0 2023-11-29 13:42:22,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3987740.0, ans=0.1 2023-11-29 13:42:28,250 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3987740.0, ans=0.125 2023-11-29 13:42:33,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3987740.0, ans=0.125 2023-11-29 13:42:35,646 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 9000, loss[loss=0.06409, simple_loss=0.09028, pruned_loss=0.01026, audio_tagging_loss=0.008686, over 15440.00 frames. ], tot_loss[loss=0.06507, simple_loss=0.08989, pruned_loss=0.01179, audio_tagging_loss=0.008342, over 3042977.47 frames. ], batch size: 59, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 13:42:35,647 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-29 13:43:16,188 INFO [train_asr.py:1267] (1/4) Epoch 50, validation: loss=0.05899, simple_loss=0.05036, pruned_loss=0.005383, audio_tagging_loss=0.02843, over 4681554.00 frames. 2023-11-29 13:43:16,189 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-29 13:43:22,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3987806.6666666665, ans=0.0 2023-11-29 13:43:25,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3987806.6666666665, ans=0.1 2023-11-29 13:43:45,688 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3987940.0, ans=0.0 2023-11-29 13:43:49,563 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 598200 2023-11-29 13:44:07,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3988073.3333333335, ans=0.0 2023-11-29 13:44:12,251 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.193e+01 9.277e+01 9.847e+01 1.044e+02 1.354e+02, threshold=1.969e+02, percent-clipped=0.0 2023-11-29 13:44:18,130 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 9050, loss[loss=0.04591, simple_loss=0.05633, pruned_loss=0.006764, audio_tagging_loss=0.01098, over 15713.00 frames. ], tot_loss[loss=0.0649, simple_loss=0.08957, pruned_loss=0.01174, audio_tagging_loss=0.008379, over 3051959.92 frames. ], batch size: 60, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 13:44:46,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3988273.3333333335, ans=0.025 2023-11-29 13:44:50,779 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 598250 2023-11-29 13:45:16,007 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3988406.6666666665, ans=0.05 2023-11-29 13:45:20,143 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 9100, loss[loss=0.06742, simple_loss=0.09205, pruned_loss=0.01324, audio_tagging_loss=0.008156, over 14711.00 frames. ], tot_loss[loss=0.06482, simple_loss=0.08929, pruned_loss=0.01183, audio_tagging_loss=0.008353, over 3047533.19 frames. ], batch size: 54, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 13:45:35,354 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3988540.0, ans=0.125 2023-11-29 13:45:46,637 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.68 vs. limit=12.0 2023-11-29 13:45:54,332 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 598300 2023-11-29 13:45:55,705 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3988606.6666666665, ans=0.125 2023-11-29 13:46:05,777 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.50 vs. limit=10.0 2023-11-29 13:46:07,284 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.25 vs. limit=12.0 2023-11-29 13:46:16,764 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.793e+01 9.247e+01 9.830e+01 1.081e+02 1.321e+02, threshold=1.966e+02, percent-clipped=0.0 2023-11-29 13:46:22,635 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 9150, loss[loss=0.06836, simple_loss=0.09028, pruned_loss=0.01489, audio_tagging_loss=0.008327, over 15616.00 frames. ], tot_loss[loss=0.06484, simple_loss=0.0892, pruned_loss=0.0119, audio_tagging_loss=0.008342, over 3051932.73 frames. ], batch size: 56, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 13:46:26,849 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.78 vs. limit=15.0 2023-11-29 13:46:56,394 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 598350 2023-11-29 13:47:11,194 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3989073.3333333335, ans=0.0 2023-11-29 13:47:22,207 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.20 vs. limit=15.0 2023-11-29 13:47:25,271 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 9200, loss[loss=0.03579, simple_loss=0.03956, pruned_loss=0.004479, audio_tagging_loss=0.01153, over 14458.00 frames. ], tot_loss[loss=0.06398, simple_loss=0.08779, pruned_loss=0.0117, audio_tagging_loss=0.008387, over 3053492.37 frames. ], batch size: 57, lr: 1.34e-03, grad_scale: 32.0 2023-11-29 13:47:28,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3989140.0, ans=0.125 2023-11-29 13:47:47,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3989206.6666666665, ans=0.2 2023-11-29 13:47:51,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3989273.3333333335, ans=0.0 2023-11-29 13:47:55,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3989273.3333333335, ans=0.0 2023-11-29 13:47:57,976 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 598400 2023-11-29 13:48:00,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3989340.0, ans=0.125 2023-11-29 13:48:07,775 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3989340.0, ans=0.0 2023-11-29 13:48:09,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3989340.0, ans=0.0 2023-11-29 13:48:14,724 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3989406.6666666665, ans=0.125 2023-11-29 13:48:21,337 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.113e+01 8.841e+01 9.563e+01 1.025e+02 1.500e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-29 13:48:26,002 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 9250, loss[loss=0.06114, simple_loss=0.08461, pruned_loss=0.00842, audio_tagging_loss=0.01041, over 15357.00 frames. ], tot_loss[loss=0.0635, simple_loss=0.0873, pruned_loss=0.01145, audio_tagging_loss=0.008403, over 3063332.23 frames. ], batch size: 56, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 13:48:33,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3989473.3333333335, ans=0.125 2023-11-29 13:48:40,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3989540.0, ans=0.125 2023-11-29 13:48:45,636 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3989540.0, ans=0.0 2023-11-29 13:49:00,800 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 598450 2023-11-29 13:49:12,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3989673.3333333335, ans=0.125 2023-11-29 13:49:14,322 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.52 vs. limit=22.5 2023-11-29 13:49:23,401 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3989740.0, ans=0.125 2023-11-29 13:49:25,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3989740.0, ans=0.125 2023-11-29 13:49:28,985 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 9300, loss[loss=0.05766, simple_loss=0.08245, pruned_loss=0.007568, audio_tagging_loss=0.008866, over 16593.00 frames. ], tot_loss[loss=0.06418, simple_loss=0.08824, pruned_loss=0.01164, audio_tagging_loss=0.00842, over 3065190.00 frames. ], batch size: 63, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 13:49:44,085 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3989873.3333333335, ans=0.2 2023-11-29 13:49:58,874 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.18 vs. limit=22.5 2023-11-29 13:50:02,644 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 598500 2023-11-29 13:50:08,028 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.26 vs. limit=15.0 2023-11-29 13:50:23,348 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3990073.3333333335, ans=0.125 2023-11-29 13:50:25,502 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.873e+01 9.186e+01 9.880e+01 1.044e+02 1.365e+02, threshold=1.976e+02, percent-clipped=0.0 2023-11-29 13:50:30,902 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 9350, loss[loss=0.05245, simple_loss=0.07385, pruned_loss=0.007311, audio_tagging_loss=0.008207, over 14424.00 frames. ], tot_loss[loss=0.06406, simple_loss=0.088, pruned_loss=0.01164, audio_tagging_loss=0.008423, over 3061155.38 frames. ], batch size: 56, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 13:50:57,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3990273.3333333335, ans=0.0 2023-11-29 13:50:59,455 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.46 vs. limit=15.0 2023-11-29 13:51:04,939 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 598550 2023-11-29 13:51:10,472 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.20 vs. limit=15.0 2023-11-29 13:51:33,521 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 9400, loss[loss=0.05401, simple_loss=0.07768, pruned_loss=0.006298, audio_tagging_loss=0.008875, over 15309.00 frames. ], tot_loss[loss=0.06338, simple_loss=0.0867, pruned_loss=0.0115, audio_tagging_loss=0.008531, over 3061146.14 frames. ], batch size: 56, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 13:52:06,962 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 598600 2023-11-29 13:52:07,500 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.17 vs. limit=15.0 2023-11-29 13:52:08,947 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.27 vs. limit=15.0 2023-11-29 13:52:26,793 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.82 vs. limit=8.0 2023-11-29 13:52:31,035 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.737e+01 9.140e+01 9.680e+01 1.042e+02 1.691e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-29 13:52:33,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3990740.0, ans=0.125 2023-11-29 13:52:33,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3990740.0, ans=0.125 2023-11-29 13:52:35,724 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 9450, loss[loss=0.06093, simple_loss=0.0816, pruned_loss=0.01323, audio_tagging_loss=0.006905, over 14673.00 frames. ], tot_loss[loss=0.06394, simple_loss=0.0876, pruned_loss=0.01164, audio_tagging_loss=0.008506, over 3059728.23 frames. ], batch size: 54, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 13:52:36,923 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 13:52:55,483 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3990873.3333333335, ans=0.125 2023-11-29 13:52:58,294 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.55 vs. limit=12.0 2023-11-29 13:53:03,226 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3990940.0, ans=0.0 2023-11-29 13:53:07,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3990940.0, ans=0.2 2023-11-29 13:53:08,773 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 598650 2023-11-29 13:53:35,589 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.84 vs. limit=15.0 2023-11-29 13:53:37,331 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 9500, loss[loss=0.09017, simple_loss=0.122, pruned_loss=0.01952, audio_tagging_loss=0.009656, over 13684.00 frames. ], tot_loss[loss=0.06399, simple_loss=0.08752, pruned_loss=0.01161, audio_tagging_loss=0.008618, over 3054529.67 frames. ], batch size: 51, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 13:54:10,570 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 598700 2023-11-29 13:54:33,313 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.631e+01 9.161e+01 9.773e+01 1.049e+02 1.216e+02, threshold=1.955e+02, percent-clipped=0.0 2023-11-29 13:54:38,469 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 9550, loss[loss=0.07478, simple_loss=0.09556, pruned_loss=0.01758, audio_tagging_loss=0.009426, over 14927.00 frames. ], tot_loss[loss=0.06453, simple_loss=0.08824, pruned_loss=0.0117, audio_tagging_loss=0.008712, over 3049350.93 frames. ], batch size: 56, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 13:54:40,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3991473.3333333335, ans=0.015 2023-11-29 13:54:41,101 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3991473.3333333335, ans=0.1 2023-11-29 13:54:54,057 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3991540.0, ans=0.2 2023-11-29 13:55:11,989 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 598750 2023-11-29 13:55:21,441 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3991673.3333333335, ans=0.0 2023-11-29 13:55:30,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3991740.0, ans=0.125 2023-11-29 13:55:36,576 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3991740.0, ans=10.0 2023-11-29 13:55:40,318 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 9600, loss[loss=0.06757, simple_loss=0.09317, pruned_loss=0.01361, audio_tagging_loss=0.007383, over 15142.00 frames. ], tot_loss[loss=0.06484, simple_loss=0.08884, pruned_loss=0.01165, audio_tagging_loss=0.008766, over 3054691.23 frames. ], batch size: 57, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 13:56:01,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3991873.3333333335, ans=0.04949747468305833 2023-11-29 13:56:12,541 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 598800 2023-11-29 13:56:22,839 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.11 vs. limit=15.0 2023-11-29 13:56:37,810 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.152e+01 9.142e+01 9.552e+01 1.023e+02 1.315e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-29 13:56:38,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3992073.3333333335, ans=0.2 2023-11-29 13:56:41,284 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 9650, loss[loss=0.04973, simple_loss=0.06918, pruned_loss=0.007279, audio_tagging_loss=0.007858, over 14461.00 frames. ], tot_loss[loss=0.06464, simple_loss=0.08849, pruned_loss=0.01168, audio_tagging_loss=0.008717, over 3044668.38 frames. ], batch size: 56, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 13:57:10,245 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-29 13:57:15,238 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 598850 2023-11-29 13:57:26,912 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3992340.0, ans=0.125 2023-11-29 13:57:31,109 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3992406.6666666665, ans=0.125 2023-11-29 13:57:42,293 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 9700, loss[loss=0.06582, simple_loss=0.09263, pruned_loss=0.01192, audio_tagging_loss=0.007579, over 14543.00 frames. ], tot_loss[loss=0.06456, simple_loss=0.08861, pruned_loss=0.01168, audio_tagging_loss=0.00858, over 3045910.85 frames. ], batch size: 53, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 13:57:43,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3992473.3333333335, ans=0.0 2023-11-29 13:58:01,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3992540.0, ans=0.125 2023-11-29 13:58:06,255 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.76 vs. limit=22.5 2023-11-29 13:58:15,955 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 598900 2023-11-29 13:58:19,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3992673.3333333335, ans=0.04949747468305833 2023-11-29 13:58:21,518 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3992673.3333333335, ans=0.125 2023-11-29 13:58:23,882 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3992673.3333333335, ans=0.0 2023-11-29 13:58:28,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3992673.3333333335, ans=0.125 2023-11-29 13:58:40,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3992740.0, ans=0.125 2023-11-29 13:58:41,715 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.690e+01 9.286e+01 9.993e+01 1.057e+02 1.299e+02, threshold=1.999e+02, percent-clipped=0.0 2023-11-29 13:58:44,674 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 9750, loss[loss=0.0866, simple_loss=0.1257, pruned_loss=0.0182, audio_tagging_loss=0.005539, over 16063.00 frames. ], tot_loss[loss=0.0647, simple_loss=0.0889, pruned_loss=0.01179, audio_tagging_loss=0.008456, over 3045512.27 frames. ], batch size: 58, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 13:58:50,106 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.87 vs. limit=15.0 2023-11-29 13:58:56,163 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.56 vs. limit=22.5 2023-11-29 13:58:57,061 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.83 vs. limit=15.0 2023-11-29 13:59:16,789 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 598950 2023-11-29 13:59:43,273 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.95 vs. limit=15.0 2023-11-29 13:59:44,643 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 9800, loss[loss=0.07152, simple_loss=0.1031, pruned_loss=0.01127, audio_tagging_loss=0.008706, over 15191.00 frames. ], tot_loss[loss=0.06485, simple_loss=0.08938, pruned_loss=0.01174, audio_tagging_loss=0.008419, over 3042841.82 frames. ], batch size: 58, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 13:59:49,761 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.86 vs. limit=22.5 2023-11-29 13:59:57,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3993206.6666666665, ans=0.125 2023-11-29 14:00:18,825 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 599000 2023-11-29 14:00:20,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3993273.3333333335, ans=0.1 2023-11-29 14:00:43,069 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 14:00:44,120 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.931e+01 9.016e+01 9.640e+01 1.052e+02 1.257e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-29 14:00:46,483 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 9850, loss[loss=0.05947, simple_loss=0.07301, pruned_loss=0.01101, audio_tagging_loss=0.01195, over 14246.00 frames. ], tot_loss[loss=0.06486, simple_loss=0.08933, pruned_loss=0.01182, audio_tagging_loss=0.008369, over 3043571.41 frames. ], batch size: 56, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 14:00:49,101 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3993473.3333333335, ans=0.125 2023-11-29 14:00:53,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3993473.3333333335, ans=0.125 2023-11-29 14:00:54,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3993473.3333333335, ans=0.1 2023-11-29 14:01:17,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3993606.6666666665, ans=0.125 2023-11-29 14:01:20,021 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 599050 2023-11-29 14:01:33,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3993673.3333333335, ans=0.125 2023-11-29 14:01:43,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3993740.0, ans=0.07 2023-11-29 14:01:47,706 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 9900, loss[loss=0.06092, simple_loss=0.07826, pruned_loss=0.01336, audio_tagging_loss=0.008425, over 15367.00 frames. ], tot_loss[loss=0.06507, simple_loss=0.08986, pruned_loss=0.01176, audio_tagging_loss=0.008379, over 3045413.26 frames. ], batch size: 55, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 14:01:51,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3993806.6666666665, ans=0.0 2023-11-29 14:01:56,296 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.82 vs. limit=15.0 2023-11-29 14:01:58,215 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3993806.6666666665, ans=0.0 2023-11-29 14:01:59,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3993873.3333333335, ans=0.125 2023-11-29 14:02:04,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3993873.3333333335, ans=0.125 2023-11-29 14:02:04,492 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.67 vs. limit=22.5 2023-11-29 14:02:04,543 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.72 vs. limit=15.0 2023-11-29 14:02:06,485 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3993873.3333333335, ans=0.125 2023-11-29 14:02:20,701 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 599100 2023-11-29 14:02:27,917 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3994006.6666666665, ans=0.95 2023-11-29 14:02:44,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3994073.3333333335, ans=0.2 2023-11-29 14:02:46,879 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.073e+01 9.233e+01 9.702e+01 1.026e+02 1.352e+02, threshold=1.940e+02, percent-clipped=0.0 2023-11-29 14:02:49,334 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 9950, loss[loss=0.06149, simple_loss=0.07912, pruned_loss=0.01113, audio_tagging_loss=0.01081, over 14783.00 frames. ], tot_loss[loss=0.06493, simple_loss=0.08966, pruned_loss=0.01174, audio_tagging_loss=0.008365, over 3047292.01 frames. ], batch size: 55, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 14:02:51,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3994140.0, ans=0.125 2023-11-29 14:03:17,171 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3994273.3333333335, ans=0.0 2023-11-29 14:03:18,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3994273.3333333335, ans=0.0 2023-11-29 14:03:21,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3994273.3333333335, ans=0.125 2023-11-29 14:03:22,170 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 599150 2023-11-29 14:03:22,804 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.47 vs. limit=22.5 2023-11-29 14:03:37,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3994406.6666666665, ans=0.125 2023-11-29 14:03:48,531 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.63 vs. limit=22.5 2023-11-29 14:03:51,250 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 10000, loss[loss=0.05721, simple_loss=0.0791, pruned_loss=0.009489, audio_tagging_loss=0.008167, over 15087.00 frames. ], tot_loss[loss=0.06471, simple_loss=0.08938, pruned_loss=0.01168, audio_tagging_loss=0.008341, over 3046354.04 frames. ], batch size: 57, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:04:03,072 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3994540.0, ans=0.125 2023-11-29 14:04:05,811 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.13 vs. limit=15.0 2023-11-29 14:04:13,897 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 14:04:24,910 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 599200 2023-11-29 14:04:43,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3994740.0, ans=0.0 2023-11-29 14:04:50,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3994740.0, ans=0.125 2023-11-29 14:04:50,874 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.059e+01 9.244e+01 9.877e+01 1.049e+02 1.463e+02, threshold=1.975e+02, percent-clipped=0.0 2023-11-29 14:04:51,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3994740.0, ans=0.125 2023-11-29 14:04:53,624 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 10050, loss[loss=0.03725, simple_loss=0.0469, pruned_loss=0.003446, audio_tagging_loss=0.01035, over 14968.00 frames. ], tot_loss[loss=0.06445, simple_loss=0.08889, pruned_loss=0.01163, audio_tagging_loss=0.008372, over 3044645.38 frames. ], batch size: 60, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:05:16,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3994873.3333333335, ans=0.0 2023-11-29 14:05:16,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=3994873.3333333335, ans=15.0 2023-11-29 14:05:26,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3994940.0, ans=0.1 2023-11-29 14:05:27,202 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 599250 2023-11-29 14:05:34,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3995006.6666666665, ans=0.0 2023-11-29 14:05:36,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3995006.6666666665, ans=0.5 2023-11-29 14:05:56,135 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 10100, loss[loss=0.05703, simple_loss=0.07627, pruned_loss=0.009856, audio_tagging_loss=0.009037, over 15162.00 frames. ], tot_loss[loss=0.0648, simple_loss=0.08957, pruned_loss=0.01166, audio_tagging_loss=0.008352, over 3050876.95 frames. ], batch size: 58, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:05:59,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3995140.0, ans=0.2 2023-11-29 14:06:09,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3995206.6666666665, ans=0.0 2023-11-29 14:06:12,228 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3995206.6666666665, ans=0.1 2023-11-29 14:06:14,704 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3995206.6666666665, ans=0.2 2023-11-29 14:06:17,240 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.45 vs. limit=15.0 2023-11-29 14:06:28,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3995273.3333333335, ans=0.1 2023-11-29 14:06:29,076 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 599300 2023-11-29 14:06:48,887 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 14:06:54,625 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.637e+01 9.067e+01 9.808e+01 1.052e+02 1.257e+02, threshold=1.962e+02, percent-clipped=0.0 2023-11-29 14:06:57,693 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 10150, loss[loss=0.07934, simple_loss=0.1118, pruned_loss=0.01668, audio_tagging_loss=0.00675, over 15307.00 frames. ], tot_loss[loss=0.06421, simple_loss=0.08827, pruned_loss=0.0115, audio_tagging_loss=0.008577, over 3054258.82 frames. ], batch size: 56, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:07:00,712 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.14 vs. limit=15.0 2023-11-29 14:07:22,743 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.50 vs. limit=15.0 2023-11-29 14:07:23,731 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3995606.6666666665, ans=0.125 2023-11-29 14:07:27,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3995606.6666666665, ans=0.1 2023-11-29 14:07:29,286 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 14:07:31,218 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 599350 2023-11-29 14:07:31,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3995606.6666666665, ans=0.1 2023-11-29 14:07:55,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3995740.0, ans=0.125 2023-11-29 14:07:58,623 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 10200, loss[loss=0.06825, simple_loss=0.08972, pruned_loss=0.01661, audio_tagging_loss=0.006773, over 14616.00 frames. ], tot_loss[loss=0.06378, simple_loss=0.08761, pruned_loss=0.01135, audio_tagging_loss=0.008621, over 3050926.40 frames. ], batch size: 55, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:08:09,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3995806.6666666665, ans=0.2 2023-11-29 14:08:25,180 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 14:08:25,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3995940.0, ans=0.0 2023-11-29 14:08:32,955 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 599400 2023-11-29 14:08:38,668 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.98 vs. limit=22.5 2023-11-29 14:08:44,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3996006.6666666665, ans=0.125 2023-11-29 14:08:59,151 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.884e+01 9.205e+01 9.737e+01 1.021e+02 2.393e+02, threshold=1.947e+02, percent-clipped=1.0 2023-11-29 14:09:01,493 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 10250, loss[loss=0.06348, simple_loss=0.08468, pruned_loss=0.01011, audio_tagging_loss=0.01103, over 14551.00 frames. ], tot_loss[loss=0.06433, simple_loss=0.08809, pruned_loss=0.01155, audio_tagging_loss=0.008739, over 3048944.50 frames. ], batch size: 53, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:09:05,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3996140.0, ans=0.125 2023-11-29 14:09:18,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3996206.6666666665, ans=0.125 2023-11-29 14:09:22,399 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3996206.6666666665, ans=0.1 2023-11-29 14:09:30,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3996273.3333333335, ans=0.0 2023-11-29 14:09:33,763 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 599450 2023-11-29 14:09:51,494 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3996406.6666666665, ans=0.1 2023-11-29 14:10:03,464 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 10300, loss[loss=0.0725, simple_loss=0.1035, pruned_loss=0.01185, audio_tagging_loss=0.008928, over 16818.00 frames. ], tot_loss[loss=0.06454, simple_loss=0.08833, pruned_loss=0.01171, audio_tagging_loss=0.008669, over 3050882.83 frames. ], batch size: 62, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:10:21,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3996540.0, ans=0.1 2023-11-29 14:10:22,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3996540.0, ans=0.125 2023-11-29 14:10:30,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3996606.6666666665, ans=0.0 2023-11-29 14:10:38,044 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 599500 2023-11-29 14:10:53,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3996740.0, ans=0.125 2023-11-29 14:10:56,150 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.60 vs. limit=22.5 2023-11-29 14:10:58,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3996740.0, ans=0.0 2023-11-29 14:11:02,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3996740.0, ans=0.125 2023-11-29 14:11:03,603 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.634e+01 9.141e+01 9.728e+01 1.059e+02 1.776e+02, threshold=1.946e+02, percent-clipped=0.0 2023-11-29 14:11:06,030 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 10350, loss[loss=0.08373, simple_loss=0.1145, pruned_loss=0.01592, audio_tagging_loss=0.01058, over 15345.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08966, pruned_loss=0.01195, audio_tagging_loss=0.008739, over 3051813.01 frames. ], batch size: 55, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:11:34,745 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3996940.0, ans=0.125 2023-11-29 14:11:40,506 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 599550 2023-11-29 14:12:04,146 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 14:12:08,471 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 10400, loss[loss=0.05239, simple_loss=0.06864, pruned_loss=0.009116, audio_tagging_loss=0.008949, over 13963.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08985, pruned_loss=0.01195, audio_tagging_loss=0.008764, over 3047300.46 frames. ], batch size: 54, lr: 1.34e-03, grad_scale: 32.0 2023-11-29 14:12:14,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3997140.0, ans=0.125 2023-11-29 14:12:20,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3997206.6666666665, ans=0.0 2023-11-29 14:12:42,196 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 599600 2023-11-29 14:13:07,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3997406.6666666665, ans=0.0 2023-11-29 14:13:08,427 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.725e+01 9.101e+01 9.809e+01 1.030e+02 1.340e+02, threshold=1.962e+02, percent-clipped=0.0 2023-11-29 14:13:10,959 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 10450, loss[loss=0.07015, simple_loss=0.0933, pruned_loss=0.01444, audio_tagging_loss=0.009059, over 16126.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.08986, pruned_loss=0.01197, audio_tagging_loss=0.008711, over 3052643.38 frames. ], batch size: 59, lr: 1.34e-03, grad_scale: 32.0 2023-11-29 14:13:12,500 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3997473.3333333335, ans=0.0 2023-11-29 14:13:14,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3997473.3333333335, ans=0.0 2023-11-29 14:13:27,840 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.49 vs. limit=6.0 2023-11-29 14:13:44,457 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.52 vs. limit=15.0 2023-11-29 14:13:44,790 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 599650 2023-11-29 14:14:05,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3997740.0, ans=0.125 2023-11-29 14:14:09,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3997740.0, ans=0.2 2023-11-29 14:14:12,913 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 10500, loss[loss=0.05937, simple_loss=0.08643, pruned_loss=0.007513, audio_tagging_loss=0.008644, over 14738.00 frames. ], tot_loss[loss=0.06484, simple_loss=0.08889, pruned_loss=0.0118, audio_tagging_loss=0.008592, over 3041859.09 frames. ], batch size: 54, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:14:39,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3997940.0, ans=0.1 2023-11-29 14:14:40,493 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.03 vs. limit=10.0 2023-11-29 14:14:45,978 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 599700 2023-11-29 14:14:56,372 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.84 vs. limit=6.0 2023-11-29 14:15:11,585 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3998073.3333333335, ans=0.125 2023-11-29 14:15:14,118 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.802e+01 9.178e+01 9.890e+01 1.072e+02 1.434e+02, threshold=1.978e+02, percent-clipped=0.0 2023-11-29 14:15:15,328 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 10550, loss[loss=0.07754, simple_loss=0.1106, pruned_loss=0.01579, audio_tagging_loss=0.006427, over 15159.00 frames. ], tot_loss[loss=0.06492, simple_loss=0.08903, pruned_loss=0.01187, audio_tagging_loss=0.008535, over 3046749.94 frames. ], batch size: 58, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:15:31,919 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3998206.6666666665, ans=0.0 2023-11-29 14:15:36,232 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.01 vs. limit=10.0 2023-11-29 14:15:48,296 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 599750 2023-11-29 14:15:50,882 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3998340.0, ans=0.125 2023-11-29 14:16:08,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3998406.6666666665, ans=0.1 2023-11-29 14:16:11,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3998406.6666666665, ans=0.1 2023-11-29 14:16:16,387 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 10600, loss[loss=0.05695, simple_loss=0.07801, pruned_loss=0.01136, audio_tagging_loss=0.006578, over 15369.00 frames. ], tot_loss[loss=0.06405, simple_loss=0.08771, pruned_loss=0.01168, audio_tagging_loss=0.008506, over 3036042.88 frames. ], batch size: 58, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:16:16,827 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3998473.3333333335, ans=0.125 2023-11-29 14:16:17,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3998473.3333333335, ans=0.0 2023-11-29 14:16:23,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3998473.3333333335, ans=0.1 2023-11-29 14:16:31,924 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3998540.0, ans=0.0 2023-11-29 14:16:45,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3998606.6666666665, ans=0.2 2023-11-29 14:16:48,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3998606.6666666665, ans=0.0 2023-11-29 14:16:50,404 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 599800 2023-11-29 14:16:55,791 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.42 vs. limit=22.5 2023-11-29 14:17:12,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3998740.0, ans=0.125 2023-11-29 14:17:14,047 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3998740.0, ans=0.125 2023-11-29 14:17:17,870 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.533e+01 8.938e+01 9.773e+01 1.042e+02 1.293e+02, threshold=1.955e+02, percent-clipped=0.0 2023-11-29 14:17:19,091 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 10650, loss[loss=0.06112, simple_loss=0.08955, pruned_loss=0.009669, audio_tagging_loss=0.006681, over 15661.00 frames. ], tot_loss[loss=0.06374, simple_loss=0.08772, pruned_loss=0.01144, audio_tagging_loss=0.008439, over 3038312.42 frames. ], batch size: 61, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:17:27,631 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.42 vs. limit=15.0 2023-11-29 14:17:32,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3998873.3333333335, ans=0.1 2023-11-29 14:17:48,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3998940.0, ans=0.125 2023-11-29 14:17:51,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3998940.0, ans=0.1 2023-11-29 14:17:51,958 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 599850 2023-11-29 14:17:56,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3999006.6666666665, ans=0.07 2023-11-29 14:18:09,953 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.04 vs. limit=22.5 2023-11-29 14:18:10,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3999073.3333333335, ans=0.1 2023-11-29 14:18:17,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3999073.3333333335, ans=0.2 2023-11-29 14:18:20,970 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 10700, loss[loss=0.05804, simple_loss=0.08191, pruned_loss=0.007581, audio_tagging_loss=0.009507, over 14624.00 frames. ], tot_loss[loss=0.06379, simple_loss=0.08779, pruned_loss=0.01142, audio_tagging_loss=0.008477, over 3038182.81 frames. ], batch size: 57, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:18:31,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3999206.6666666665, ans=0.05 2023-11-29 14:18:41,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3999206.6666666665, ans=0.1 2023-11-29 14:18:54,319 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 599900 2023-11-29 14:18:58,212 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3999340.0, ans=0.125 2023-11-29 14:19:01,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3999340.0, ans=0.1 2023-11-29 14:19:03,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3999340.0, ans=0.1 2023-11-29 14:19:09,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3999406.6666666665, ans=0.125 2023-11-29 14:19:20,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3999406.6666666665, ans=0.2 2023-11-29 14:19:21,113 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.715e+01 9.059e+01 9.669e+01 1.032e+02 1.449e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-29 14:19:22,394 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 10750, loss[loss=0.06219, simple_loss=0.09088, pruned_loss=0.009807, audio_tagging_loss=0.006938, over 15536.00 frames. ], tot_loss[loss=0.06346, simple_loss=0.08719, pruned_loss=0.01135, audio_tagging_loss=0.008513, over 3041269.68 frames. ], batch size: 57, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:19:56,711 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 599950 2023-11-29 14:20:10,743 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 14:20:12,864 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3999740.0, ans=0.125 2023-11-29 14:20:23,577 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.44 vs. limit=22.5 2023-11-29 14:20:23,921 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 10800, loss[loss=0.07807, simple_loss=0.1111, pruned_loss=0.01413, audio_tagging_loss=0.008378, over 16338.00 frames. ], tot_loss[loss=0.06361, simple_loss=0.08755, pruned_loss=0.0114, audio_tagging_loss=0.008426, over 3038901.94 frames. ], batch size: 62, lr: 1.34e-03, grad_scale: 32.0 2023-11-29 14:20:29,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.whiten.whitening_limit, batch_count=3999806.6666666665, ans=12.0 2023-11-29 14:20:37,088 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3999873.3333333335, ans=0.0 2023-11-29 14:20:57,333 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 600000 2023-11-29 14:20:57,509 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3999940.0, ans=0.1 2023-11-29 14:21:11,016 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.83 vs. limit=15.0 2023-11-29 14:21:15,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4000073.3333333335, ans=0.2 2023-11-29 14:21:24,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4000073.3333333335, ans=0.125 2023-11-29 14:21:28,900 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.529e+01 9.037e+01 9.839e+01 1.048e+02 1.354e+02, threshold=1.968e+02, percent-clipped=0.0 2023-11-29 14:21:28,940 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 10850, loss[loss=0.0692, simple_loss=0.09615, pruned_loss=0.01264, audio_tagging_loss=0.008482, over 14766.00 frames. ], tot_loss[loss=0.06396, simple_loss=0.08815, pruned_loss=0.01152, audio_tagging_loss=0.008373, over 3040256.67 frames. ], batch size: 54, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:21:40,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4000206.6666666665, ans=0.125 2023-11-29 14:21:43,510 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.81 vs. limit=15.0 2023-11-29 14:22:02,304 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 600050 2023-11-29 14:22:03,730 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4000273.3333333335, ans=0.125 2023-11-29 14:22:08,921 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4000340.0, ans=0.125 2023-11-29 14:22:30,832 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 10900, loss[loss=0.05444, simple_loss=0.06904, pruned_loss=0.01017, audio_tagging_loss=0.009751, over 14771.00 frames. ], tot_loss[loss=0.06386, simple_loss=0.08763, pruned_loss=0.01152, audio_tagging_loss=0.00853, over 3041338.90 frames. ], batch size: 56, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:22:30,873 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 14:22:45,007 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.30 vs. limit=10.0 2023-11-29 14:23:03,936 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 600100 2023-11-29 14:23:20,331 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.56 vs. limit=15.0 2023-11-29 14:23:22,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4000740.0, ans=0.2 2023-11-29 14:23:30,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=4000740.0, ans=15.0 2023-11-29 14:23:32,242 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.967e+01 9.295e+01 9.770e+01 1.037e+02 1.464e+02, threshold=1.954e+02, percent-clipped=0.0 2023-11-29 14:23:32,280 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 10950, loss[loss=0.06441, simple_loss=0.09049, pruned_loss=0.009317, audio_tagging_loss=0.009847, over 14000.00 frames. ], tot_loss[loss=0.06428, simple_loss=0.08817, pruned_loss=0.01166, audio_tagging_loss=0.008539, over 3038883.82 frames. ], batch size: 56, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:23:34,135 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.84 vs. limit=15.0 2023-11-29 14:23:44,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4000873.3333333335, ans=0.125 2023-11-29 14:23:55,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4000873.3333333335, ans=0.125 2023-11-29 14:24:05,806 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 600150 2023-11-29 14:24:18,246 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4001006.6666666665, ans=0.0 2023-11-29 14:24:34,445 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 11000, loss[loss=0.05827, simple_loss=0.07646, pruned_loss=0.01176, audio_tagging_loss=0.008276, over 14893.00 frames. ], tot_loss[loss=0.06444, simple_loss=0.08853, pruned_loss=0.01161, audio_tagging_loss=0.008567, over 3051069.52 frames. ], batch size: 57, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:24:48,159 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 14:25:07,612 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 600200 2023-11-29 14:25:25,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4001406.6666666665, ans=0.04949747468305833 2023-11-29 14:25:35,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4001473.3333333335, ans=0.0 2023-11-29 14:25:36,589 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.589e+01 8.890e+01 9.536e+01 1.018e+02 1.298e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-29 14:25:36,618 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 11050, loss[loss=0.05367, simple_loss=0.07254, pruned_loss=0.0101, audio_tagging_loss=0.007295, over 14235.00 frames. ], tot_loss[loss=0.06468, simple_loss=0.08861, pruned_loss=0.01171, audio_tagging_loss=0.008672, over 3044257.53 frames. ], batch size: 54, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:25:40,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4001473.3333333335, ans=0.2 2023-11-29 14:25:47,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4001540.0, ans=0.125 2023-11-29 14:25:49,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=4001540.0, ans=0.5 2023-11-29 14:25:53,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4001540.0, ans=0.125 2023-11-29 14:25:57,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4001540.0, ans=0.0 2023-11-29 14:26:04,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4001606.6666666665, ans=0.0 2023-11-29 14:26:09,898 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 600250 2023-11-29 14:26:32,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4001740.0, ans=0.2 2023-11-29 14:26:33,260 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4001740.0, ans=0.1 2023-11-29 14:26:38,317 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 11100, loss[loss=0.07376, simple_loss=0.1026, pruned_loss=0.0132, audio_tagging_loss=0.009256, over 15551.00 frames. ], tot_loss[loss=0.0645, simple_loss=0.08808, pruned_loss=0.01172, audio_tagging_loss=0.00874, over 3040183.75 frames. ], batch size: 57, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:27:04,188 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.77 vs. limit=15.0 2023-11-29 14:27:11,746 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 600300 2023-11-29 14:27:29,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4002073.3333333335, ans=0.0 2023-11-29 14:27:30,907 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.24 vs. limit=15.0 2023-11-29 14:27:40,200 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.596e+01 9.265e+01 9.785e+01 1.064e+02 1.288e+02, threshold=1.957e+02, percent-clipped=0.0 2023-11-29 14:27:40,231 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 11150, loss[loss=0.06828, simple_loss=0.09719, pruned_loss=0.01063, audio_tagging_loss=0.009059, over 15376.00 frames. ], tot_loss[loss=0.06467, simple_loss=0.08851, pruned_loss=0.01161, audio_tagging_loss=0.00881, over 3041959.08 frames. ], batch size: 56, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:27:45,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4002140.0, ans=0.125 2023-11-29 14:27:50,199 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.83 vs. limit=15.0 2023-11-29 14:27:58,328 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.12 vs. limit=15.0 2023-11-29 14:28:13,853 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 600350 2023-11-29 14:28:21,360 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.16 vs. limit=15.0 2023-11-29 14:28:32,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4002406.6666666665, ans=0.07 2023-11-29 14:28:41,708 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 11200, loss[loss=0.04509, simple_loss=0.05724, pruned_loss=0.007616, audio_tagging_loss=0.008856, over 14926.00 frames. ], tot_loss[loss=0.06481, simple_loss=0.08871, pruned_loss=0.01162, audio_tagging_loss=0.008835, over 3040281.32 frames. ], batch size: 57, lr: 1.34e-03, grad_scale: 32.0 2023-11-29 14:29:00,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4002540.0, ans=0.125 2023-11-29 14:29:14,796 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 600400 2023-11-29 14:29:21,435 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.02 vs. limit=6.0 2023-11-29 14:29:27,475 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.22 vs. limit=22.5 2023-11-29 14:29:29,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4002673.3333333335, ans=0.0 2023-11-29 14:29:43,056 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.595e+01 9.140e+01 9.616e+01 1.036e+02 1.306e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-29 14:29:43,086 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 11250, loss[loss=0.05793, simple_loss=0.079, pruned_loss=0.00882, audio_tagging_loss=0.009605, over 15898.00 frames. ], tot_loss[loss=0.06437, simple_loss=0.08789, pruned_loss=0.01163, audio_tagging_loss=0.008801, over 3040960.37 frames. ], batch size: 58, lr: 1.34e-03, grad_scale: 32.0 2023-11-29 14:29:51,383 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.64 vs. limit=15.0 2023-11-29 14:30:01,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=4002873.3333333335, ans=0.125 2023-11-29 14:30:17,312 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 600450 2023-11-29 14:30:21,114 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4003006.6666666665, ans=0.0 2023-11-29 14:30:44,775 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 11300, loss[loss=0.0761, simple_loss=0.1031, pruned_loss=0.01639, audio_tagging_loss=0.008187, over 15132.00 frames. ], tot_loss[loss=0.06427, simple_loss=0.088, pruned_loss=0.01161, audio_tagging_loss=0.008667, over 3044958.49 frames. ], batch size: 55, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 14:30:52,898 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.17 vs. limit=22.5 2023-11-29 14:31:13,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4003273.3333333335, ans=0.0 2023-11-29 14:31:18,267 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 600500 2023-11-29 14:31:46,839 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 11350, loss[loss=0.07833, simple_loss=0.1143, pruned_loss=0.01506, audio_tagging_loss=0.00612, over 14796.00 frames. ], tot_loss[loss=0.06456, simple_loss=0.08857, pruned_loss=0.01175, audio_tagging_loss=0.008526, over 3044853.15 frames. ], batch size: 57, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 14:31:49,748 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.915e+01 9.251e+01 9.881e+01 1.050e+02 2.034e+02, threshold=1.976e+02, percent-clipped=1.0 2023-11-29 14:31:56,476 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.71 vs. limit=15.0 2023-11-29 14:32:12,042 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.41 vs. limit=12.0 2023-11-29 14:32:16,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4003606.6666666665, ans=0.125 2023-11-29 14:32:19,809 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 600550 2023-11-29 14:32:48,402 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 11400, loss[loss=0.07098, simple_loss=0.1018, pruned_loss=0.01059, audio_tagging_loss=0.009468, over 14994.00 frames. ], tot_loss[loss=0.06405, simple_loss=0.08792, pruned_loss=0.01156, audio_tagging_loss=0.008522, over 3038930.07 frames. ], batch size: 56, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 14:32:48,602 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4003806.6666666665, ans=0.0 2023-11-29 14:32:56,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4003806.6666666665, ans=0.125 2023-11-29 14:33:22,123 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 600600 2023-11-29 14:33:49,946 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 11450, loss[loss=0.0624, simple_loss=0.08004, pruned_loss=0.01297, audio_tagging_loss=0.009416, over 16169.00 frames. ], tot_loss[loss=0.0643, simple_loss=0.0883, pruned_loss=0.01164, audio_tagging_loss=0.008513, over 3038852.19 frames. ], batch size: 61, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 14:33:52,185 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.736e+01 9.290e+01 9.810e+01 1.057e+02 1.472e+02, threshold=1.962e+02, percent-clipped=0.0 2023-11-29 14:34:00,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4004140.0, ans=0.125 2023-11-29 14:34:24,116 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 600650 2023-11-29 14:34:31,267 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.10 vs. limit=15.0 2023-11-29 14:34:39,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4004406.6666666665, ans=0.0 2023-11-29 14:34:39,289 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4004406.6666666665, ans=0.2 2023-11-29 14:34:53,816 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 11500, loss[loss=0.06913, simple_loss=0.09746, pruned_loss=0.0119, audio_tagging_loss=0.008499, over 14427.00 frames. ], tot_loss[loss=0.06439, simple_loss=0.08853, pruned_loss=0.01163, audio_tagging_loss=0.008487, over 3045844.24 frames. ], batch size: 54, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 14:34:59,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4004473.3333333335, ans=0.125 2023-11-29 14:35:00,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4004473.3333333335, ans=0.125 2023-11-29 14:35:12,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4004540.0, ans=0.2 2023-11-29 14:35:26,649 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 600700 2023-11-29 14:35:46,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4004740.0, ans=0.2 2023-11-29 14:35:51,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4004740.0, ans=0.0 2023-11-29 14:35:55,393 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 11550, loss[loss=0.0568, simple_loss=0.07801, pruned_loss=0.009392, audio_tagging_loss=0.008396, over 16092.00 frames. ], tot_loss[loss=0.0646, simple_loss=0.08887, pruned_loss=0.01172, audio_tagging_loss=0.00845, over 3041694.71 frames. ], batch size: 60, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 14:35:57,775 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.705e+01 9.007e+01 9.636e+01 1.040e+02 1.609e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-29 14:36:02,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4004806.6666666665, ans=0.07 2023-11-29 14:36:19,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4004940.0, ans=0.0 2023-11-29 14:36:25,208 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.61 vs. limit=15.0 2023-11-29 14:36:28,692 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 600750 2023-11-29 14:36:33,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4005006.6666666665, ans=0.125 2023-11-29 14:36:36,908 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 14:36:37,189 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4005006.6666666665, ans=0.125 2023-11-29 14:36:39,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4005006.6666666665, ans=0.125 2023-11-29 14:36:40,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4005006.6666666665, ans=0.125 2023-11-29 14:36:56,758 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 11600, loss[loss=0.06122, simple_loss=0.08908, pruned_loss=0.01026, audio_tagging_loss=0.006414, over 15446.00 frames. ], tot_loss[loss=0.06471, simple_loss=0.08929, pruned_loss=0.01164, audio_tagging_loss=0.008423, over 3041714.95 frames. ], batch size: 57, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:37:08,909 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.43 vs. limit=15.0 2023-11-29 14:37:10,784 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4005206.6666666665, ans=0.125 2023-11-29 14:37:25,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4005273.3333333335, ans=0.0 2023-11-29 14:37:27,394 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=4005273.3333333335, ans=0.125 2023-11-29 14:37:30,322 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 600800 2023-11-29 14:37:34,536 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4005340.0, ans=0.125 2023-11-29 14:37:36,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4005340.0, ans=0.125 2023-11-29 14:37:43,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4005340.0, ans=0.125 2023-11-29 14:37:50,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=4005406.6666666665, ans=0.2 2023-11-29 14:37:56,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=4005406.6666666665, ans=0.5 2023-11-29 14:37:58,592 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 11650, loss[loss=0.06099, simple_loss=0.08763, pruned_loss=0.008766, audio_tagging_loss=0.008405, over 14940.00 frames. ], tot_loss[loss=0.06485, simple_loss=0.08924, pruned_loss=0.01175, audio_tagging_loss=0.008483, over 3044520.07 frames. ], batch size: 56, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:38:00,915 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.952e+01 9.263e+01 9.866e+01 1.051e+02 2.462e+02, threshold=1.973e+02, percent-clipped=1.0 2023-11-29 14:38:08,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4005473.3333333335, ans=0.125 2023-11-29 14:38:13,661 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.56 vs. limit=22.5 2023-11-29 14:38:18,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4005540.0, ans=0.0 2023-11-29 14:38:19,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4005540.0, ans=0.0 2023-11-29 14:38:20,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4005540.0, ans=0.125 2023-11-29 14:38:28,726 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4005606.6666666665, ans=0.125 2023-11-29 14:38:29,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4005606.6666666665, ans=0.0 2023-11-29 14:38:30,477 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.04 vs. limit=12.0 2023-11-29 14:38:32,164 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 600850 2023-11-29 14:38:44,985 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.74 vs. limit=10.0 2023-11-29 14:38:50,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=4005740.0, ans=22.5 2023-11-29 14:38:54,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=4005740.0, ans=0.5 2023-11-29 14:38:59,959 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 11700, loss[loss=0.06554, simple_loss=0.08999, pruned_loss=0.01277, audio_tagging_loss=0.007776, over 15473.00 frames. ], tot_loss[loss=0.06441, simple_loss=0.0883, pruned_loss=0.01168, audio_tagging_loss=0.008575, over 3048096.00 frames. ], batch size: 58, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 14:39:08,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4005806.6666666665, ans=0.2 2023-11-29 14:39:11,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4005806.6666666665, ans=0.125 2023-11-29 14:39:15,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4005873.3333333335, ans=0.125 2023-11-29 14:39:22,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4005873.3333333335, ans=0.125 2023-11-29 14:39:28,388 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.14 vs. limit=15.0 2023-11-29 14:39:28,495 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.44 vs. limit=6.0 2023-11-29 14:39:32,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=4005940.0, ans=0.125 2023-11-29 14:39:33,601 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 600900 2023-11-29 14:39:33,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4005940.0, ans=0.125 2023-11-29 14:39:43,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4006006.6666666665, ans=0.125 2023-11-29 14:39:50,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=4006073.3333333335, ans=0.125 2023-11-29 14:40:02,096 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 11750, loss[loss=0.07231, simple_loss=0.0989, pruned_loss=0.01187, audio_tagging_loss=0.01099, over 15842.00 frames. ], tot_loss[loss=0.06479, simple_loss=0.0887, pruned_loss=0.01188, audio_tagging_loss=0.008555, over 3052013.19 frames. ], batch size: 58, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 14:40:03,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4006140.0, ans=0.0 2023-11-29 14:40:05,481 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.465e+01 9.067e+01 9.619e+01 1.045e+02 1.766e+02, threshold=1.924e+02, percent-clipped=0.0 2023-11-29 14:40:34,703 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 600950 2023-11-29 14:40:47,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=4006340.0, ans=15.0 2023-11-29 14:41:02,856 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 11800, loss[loss=0.06462, simple_loss=0.08815, pruned_loss=0.01022, audio_tagging_loss=0.01032, over 14477.00 frames. ], tot_loss[loss=0.06434, simple_loss=0.08777, pruned_loss=0.01177, audio_tagging_loss=0.008687, over 3047789.78 frames. ], batch size: 54, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 14:41:16,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4006540.0, ans=0.125 2023-11-29 14:41:16,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4006540.0, ans=0.125 2023-11-29 14:41:26,265 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.17 vs. limit=22.5 2023-11-29 14:41:29,777 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4006606.6666666665, ans=0.125 2023-11-29 14:41:35,973 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 601000 2023-11-29 14:41:53,394 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 14:42:03,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4006806.6666666665, ans=0.0 2023-11-29 14:42:04,195 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 11850, loss[loss=0.07324, simple_loss=0.1064, pruned_loss=0.01382, audio_tagging_loss=0.006204, over 14577.00 frames. ], tot_loss[loss=0.06427, simple_loss=0.08768, pruned_loss=0.01166, audio_tagging_loss=0.008772, over 3041584.14 frames. ], batch size: 56, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 14:42:07,760 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.808e+01 9.068e+01 9.599e+01 1.030e+02 1.301e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-29 14:42:21,004 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 14:42:26,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=4006873.3333333335, ans=0.125 2023-11-29 14:42:31,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4006940.0, ans=0.125 2023-11-29 14:42:37,802 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 601050 2023-11-29 14:42:45,614 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4007006.6666666665, ans=0.0 2023-11-29 14:42:51,092 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4007006.6666666665, ans=0.015 2023-11-29 14:42:56,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4007073.3333333335, ans=0.1 2023-11-29 14:43:02,672 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4007073.3333333335, ans=0.04949747468305833 2023-11-29 14:43:05,831 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 11900, loss[loss=0.0817, simple_loss=0.1168, pruned_loss=0.01688, audio_tagging_loss=0.006417, over 15044.00 frames. ], tot_loss[loss=0.06425, simple_loss=0.08743, pruned_loss=0.01169, audio_tagging_loss=0.00885, over 3038934.07 frames. ], batch size: 53, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 14:43:18,431 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.00 vs. limit=6.0 2023-11-29 14:43:28,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=4007206.6666666665, ans=0.5 2023-11-29 14:43:39,282 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 601100 2023-11-29 14:43:55,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4007406.6666666665, ans=0.035 2023-11-29 14:43:55,196 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4007406.6666666665, ans=0.125 2023-11-29 14:44:01,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4007406.6666666665, ans=0.125 2023-11-29 14:44:04,768 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.18 vs. limit=15.0 2023-11-29 14:44:07,718 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 11950, loss[loss=0.07683, simple_loss=0.1094, pruned_loss=0.01507, audio_tagging_loss=0.007044, over 15812.00 frames. ], tot_loss[loss=0.06451, simple_loss=0.08793, pruned_loss=0.01165, audio_tagging_loss=0.008894, over 3044907.06 frames. ], batch size: 58, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 14:44:11,331 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.065e+01 8.998e+01 9.725e+01 1.040e+02 1.716e+02, threshold=1.945e+02, percent-clipped=0.0 2023-11-29 14:44:31,408 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.33 vs. limit=15.0 2023-11-29 14:44:40,765 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 601150 2023-11-29 14:44:41,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4007606.6666666665, ans=0.0 2023-11-29 14:45:06,645 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4007806.6666666665, ans=0.125 2023-11-29 14:45:07,573 INFO [train_asr.py:1235] (1/4) Epoch 50, batch 12000, loss[loss=0.07607, simple_loss=0.1008, pruned_loss=0.01517, audio_tagging_loss=0.01052, over 14745.00 frames. ], tot_loss[loss=0.06471, simple_loss=0.08801, pruned_loss=0.01174, audio_tagging_loss=0.008972, over 3050370.32 frames. ], batch size: 53, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:45:07,573 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-29 14:45:47,780 INFO [train_asr.py:1267] (1/4) Epoch 50, validation: loss=0.05813, simple_loss=0.05044, pruned_loss=0.005399, audio_tagging_loss=0.02752, over 4681554.00 frames. 2023-11-29 14:45:47,781 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-29 14:46:04,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4007873.3333333335, ans=0.0 2023-11-29 14:46:13,087 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.79 vs. limit=15.0 2023-11-29 14:46:34,481 INFO [train_asr.py:1235] (1/4) Epoch 51, batch 0, loss[loss=0.07755, simple_loss=0.09507, pruned_loss=0.01123, audio_tagging_loss=0.01879, over 14204.00 frames. ], tot_loss[loss=0.07755, simple_loss=0.09507, pruned_loss=0.01123, audio_tagging_loss=0.01879, over 14204.00 frames. ], batch size: 54, lr: 1.33e-03, grad_scale: 32.0 2023-11-29 14:46:34,482 INFO [train_asr.py:1258] (1/4) Computing validation loss 2023-11-29 14:46:56,036 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.4625, 3.8045, 4.3647, 3.5136], device='cuda:1') 2023-11-29 14:47:11,086 INFO [train_asr.py:1267] (1/4) Epoch 51, validation: loss=0.05803, simple_loss=0.05046, pruned_loss=0.005398, audio_tagging_loss=0.02741, over 4681554.00 frames. 2023-11-29 14:47:11,086 INFO [train_asr.py:1268] (1/4) Maximum memory allocated so far is 25568MB 2023-11-29 14:47:13,626 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 601200 2023-11-29 14:47:19,428 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.93 vs. limit=15.0 2023-11-29 14:47:31,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=4008040.0, ans=0.0 2023-11-29 14:47:41,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4008106.6666666665, ans=0.0 2023-11-29 14:47:45,163 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.844e+01 9.507e+01 9.981e+01 1.081e+02 1.521e+02, threshold=1.996e+02, percent-clipped=0.0 2023-11-29 14:47:54,429 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4008173.3333333335, ans=0.125 2023-11-29 14:47:59,853 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4008240.0, ans=0.1 2023-11-29 14:48:13,503 INFO [train_asr.py:1235] (1/4) Epoch 51, batch 50, loss[loss=0.05081, simple_loss=0.05371, pruned_loss=0.005719, audio_tagging_loss=0.01823, over 14311.00 frames. ], tot_loss[loss=0.07393, simple_loss=0.09175, pruned_loss=0.0118, audio_tagging_loss=0.01625, over 688570.19 frames. ], batch size: 56, lr: 1.33e-03, grad_scale: 32.0 2023-11-29 14:48:15,997 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 601250 2023-11-29 14:48:18,634 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 14:48:30,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4008373.3333333335, ans=0.125 2023-11-29 14:49:08,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4008573.3333333335, ans=0.0 2023-11-29 14:49:15,443 INFO [train_asr.py:1235] (1/4) Epoch 51, batch 100, loss[loss=0.0648, simple_loss=0.08335, pruned_loss=0.008709, audio_tagging_loss=0.01441, over 15609.00 frames. ], tot_loss[loss=0.0709, simple_loss=0.0879, pruned_loss=0.01125, audio_tagging_loss=0.01569, over 1206012.87 frames. ], batch size: 58, lr: 1.33e-03, grad_scale: 16.0 2023-11-29 14:49:17,853 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 601300 2023-11-29 14:49:17,929 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4008640.0, ans=0.0 2023-11-29 14:49:30,455 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.75 vs. limit=22.5 2023-11-29 14:49:46,837 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4008773.3333333335, ans=0.0 2023-11-29 14:49:51,222 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.732e+01 9.896e+01 1.042e+02 1.115e+02 1.364e+02, threshold=2.085e+02, percent-clipped=0.0 2023-11-29 14:50:13,402 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.31 vs. limit=15.0 2023-11-29 14:50:17,053 INFO [train_asr.py:1235] (1/4) Epoch 51, batch 150, loss[loss=0.05577, simple_loss=0.06821, pruned_loss=0.01039, audio_tagging_loss=0.01127, over 14393.00 frames. ], tot_loss[loss=0.06962, simple_loss=0.08808, pruned_loss=0.01151, audio_tagging_loss=0.01407, over 1614913.41 frames. ], batch size: 54, lr: 1.33e-03, grad_scale: 16.0 2023-11-29 14:50:19,570 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 601350 2023-11-29 14:50:41,272 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.08 vs. limit=22.5 2023-11-29 14:51:14,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4009240.0, ans=0.125 2023-11-29 14:51:19,950 INFO [train_asr.py:1235] (1/4) Epoch 51, batch 200, loss[loss=0.06423, simple_loss=0.09264, pruned_loss=0.009764, audio_tagging_loss=0.008139, over 15176.00 frames. ], tot_loss[loss=0.06838, simple_loss=0.08865, pruned_loss=0.01154, audio_tagging_loss=0.01251, over 1937639.24 frames. ], batch size: 59, lr: 1.33e-03, grad_scale: 16.0 2023-11-29 14:51:22,403 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 601400 2023-11-29 14:51:30,090 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4009306.6666666665, ans=0.0 2023-11-29 14:51:55,783 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.803e+01 9.139e+01 9.906e+01 1.061e+02 1.460e+02, threshold=1.981e+02, percent-clipped=0.0 2023-11-29 14:52:06,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4009506.6666666665, ans=0.125 2023-11-29 14:52:09,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4009573.3333333335, ans=0.125 2023-11-29 14:52:21,837 INFO [train_asr.py:1235] (1/4) Epoch 51, batch 250, loss[loss=0.09647, simple_loss=0.1366, pruned_loss=0.02023, audio_tagging_loss=0.00792, over 16022.00 frames. ], tot_loss[loss=0.06739, simple_loss=0.08868, pruned_loss=0.01168, audio_tagging_loss=0.01137, over 2185178.76 frames. ], batch size: 56, lr: 1.33e-03, grad_scale: 16.0 2023-11-29 14:52:24,316 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 601450 2023-11-29 14:52:49,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=4009773.3333333335, ans=0.5 2023-11-29 14:52:50,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4009773.3333333335, ans=0.1 2023-11-29 14:52:51,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4009773.3333333335, ans=0.125 2023-11-29 14:53:03,910 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.50 vs. limit=15.0 2023-11-29 14:53:11,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=4009906.6666666665, ans=0.0 2023-11-29 14:53:22,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4009973.3333333335, ans=0.2 2023-11-29 14:53:23,702 INFO [train_asr.py:1235] (1/4) Epoch 51, batch 300, loss[loss=0.05531, simple_loss=0.08698, pruned_loss=0.004616, audio_tagging_loss=0.007197, over 14993.00 frames. ], tot_loss[loss=0.0671, simple_loss=0.08937, pruned_loss=0.01184, audio_tagging_loss=0.01057, over 2381812.23 frames. ], batch size: 57, lr: 1.33e-03, grad_scale: 16.0 2023-11-29 14:53:26,726 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 601500 2023-11-29 14:53:43,777 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.16 vs. limit=15.0 2023-11-29 14:53:44,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4010040.0, ans=0.1 2023-11-29 14:53:59,450 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.022e+01 9.446e+01 1.010e+02 1.084e+02 1.415e+02, threshold=2.020e+02, percent-clipped=0.0 2023-11-29 14:53:59,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4010173.3333333335, ans=0.1 2023-11-29 14:54:03,700 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.42 vs. limit=15.0 2023-11-29 14:54:23,892 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.35 vs. limit=15.0 2023-11-29 14:54:26,218 INFO [train_asr.py:1235] (1/4) Epoch 51, batch 350, loss[loss=0.06085, simple_loss=0.08048, pruned_loss=0.01095, audio_tagging_loss=0.009661, over 15468.00 frames. ], tot_loss[loss=0.0664, simple_loss=0.0894, pruned_loss=0.01178, audio_tagging_loss=0.009921, over 2530064.02 frames. ], batch size: 57, lr: 1.33e-03, grad_scale: 16.0 2023-11-29 14:54:29,309 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 601550 2023-11-29 14:54:48,348 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4010373.3333333335, ans=0.1 2023-11-29 14:54:53,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4010440.0, ans=0.0 2023-11-29 14:54:54,788 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=4010440.0, ans=0.05 2023-11-29 14:54:54,831 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4010440.0, ans=0.2 2023-11-29 14:55:05,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4010506.6666666665, ans=0.125 2023-11-29 14:55:06,162 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.74 vs. limit=15.0 2023-11-29 14:55:10,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4010506.6666666665, ans=0.125 2023-11-29 14:55:20,434 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.95 vs. limit=22.5 2023-11-29 14:55:27,926 INFO [train_asr.py:1235] (1/4) Epoch 51, batch 400, loss[loss=0.07009, simple_loss=0.09258, pruned_loss=0.01296, audio_tagging_loss=0.01083, over 15329.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08882, pruned_loss=0.01152, audio_tagging_loss=0.009595, over 2649900.41 frames. ], batch size: 58, lr: 1.33e-03, grad_scale: 32.0 2023-11-29 14:55:30,312 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 601600 2023-11-29 14:55:48,762 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.66 vs. limit=15.0 2023-11-29 14:56:04,473 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.022e+01 9.094e+01 9.565e+01 1.047e+02 1.359e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-29 14:56:08,413 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4010840.0, ans=0.1 2023-11-29 14:56:27,184 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.98 vs. limit=15.0 2023-11-29 14:56:29,840 INFO [train_asr.py:1235] (1/4) Epoch 51, batch 450, loss[loss=0.0475, simple_loss=0.05896, pruned_loss=0.006968, audio_tagging_loss=0.01105, over 15743.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08905, pruned_loss=0.01162, audio_tagging_loss=0.009394, over 2743653.28 frames. ], batch size: 62, lr: 1.33e-03, grad_scale: 32.0 2023-11-29 14:56:32,969 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 601650 2023-11-29 14:56:40,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4010973.3333333335, ans=0.125 2023-11-29 14:56:45,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=4011040.0, ans=0.0 2023-11-29 14:56:46,524 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.49 vs. limit=12.0 2023-11-29 14:56:53,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4011106.6666666665, ans=0.0 2023-11-29 14:57:15,670 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4011173.3333333335, ans=0.125 2023-11-29 14:57:23,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4011240.0, ans=0.125 2023-11-29 14:57:29,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4011240.0, ans=0.125 2023-11-29 14:57:31,215 INFO [train_asr.py:1235] (1/4) Epoch 51, batch 500, loss[loss=0.05739, simple_loss=0.08175, pruned_loss=0.006741, audio_tagging_loss=0.009768, over 15004.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.08896, pruned_loss=0.01166, audio_tagging_loss=0.009121, over 2809829.28 frames. ], batch size: 56, lr: 1.33e-03, grad_scale: 32.0 2023-11-29 14:57:33,669 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 601700 2023-11-29 14:57:38,145 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4011306.6666666665, ans=0.0 2023-11-29 14:58:06,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4011440.0, ans=0.0 2023-11-29 14:58:07,331 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.832e+01 9.018e+01 9.718e+01 1.038e+02 1.323e+02, threshold=1.944e+02, percent-clipped=0.0 2023-11-29 14:58:18,705 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4011573.3333333335, ans=0.125 2023-11-29 14:58:27,890 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 14:58:30,145 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4011573.3333333335, ans=0.0 2023-11-29 14:58:32,358 INFO [train_asr.py:1235] (1/4) Epoch 51, batch 550, loss[loss=0.07578, simple_loss=0.1004, pruned_loss=0.01781, audio_tagging_loss=0.007784, over 14232.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.08907, pruned_loss=0.01155, audio_tagging_loss=0.009041, over 2861947.11 frames. ], batch size: 53, lr: 1.33e-03, grad_scale: 32.0 2023-11-29 14:58:34,992 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 601750 2023-11-29 14:58:35,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4011640.0, ans=0.125 2023-11-29 14:58:54,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4011706.6666666665, ans=0.125 2023-11-29 14:59:00,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4011773.3333333335, ans=0.125 2023-11-29 14:59:16,849 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.21 vs. limit=15.0 2023-11-29 14:59:24,046 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 14:59:35,618 INFO [train_asr.py:1235] (1/4) Epoch 51, batch 600, loss[loss=0.06597, simple_loss=0.08799, pruned_loss=0.01294, audio_tagging_loss=0.009036, over 15289.00 frames. ], tot_loss[loss=0.06485, simple_loss=0.08893, pruned_loss=0.01143, audio_tagging_loss=0.008954, over 2902323.76 frames. ], batch size: 56, lr: 1.33e-03, grad_scale: 32.0 2023-11-29 14:59:38,636 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 601800 2023-11-29 14:59:40,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4011973.3333333335, ans=0.1 2023-11-29 14:59:50,080 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4012040.0, ans=0.0 2023-11-29 14:59:51,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4012040.0, ans=0.125 2023-11-29 14:59:51,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4012040.0, ans=0.07 2023-11-29 15:00:12,023 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.862e+01 9.089e+01 9.742e+01 1.033e+02 1.328e+02, threshold=1.948e+02, percent-clipped=0.0 2023-11-29 15:00:12,324 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4012173.3333333335, ans=0.2 2023-11-29 15:00:18,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4012173.3333333335, ans=0.125 2023-11-29 15:00:21,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4012173.3333333335, ans=0.1 2023-11-29 15:00:30,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4012240.0, ans=0.0 2023-11-29 15:00:38,275 INFO [train_asr.py:1235] (1/4) Epoch 51, batch 650, loss[loss=0.06302, simple_loss=0.09555, pruned_loss=0.007071, audio_tagging_loss=0.008172, over 14851.00 frames. ], tot_loss[loss=0.06433, simple_loss=0.08829, pruned_loss=0.01133, audio_tagging_loss=0.008856, over 2930681.59 frames. ], batch size: 54, lr: 1.33e-03, grad_scale: 32.0 2023-11-29 15:00:38,645 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4012306.6666666665, ans=0.0 2023-11-29 15:00:40,849 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 601850 2023-11-29 15:00:41,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4012306.6666666665, ans=0.0 2023-11-29 15:00:45,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4012306.6666666665, ans=0.125 2023-11-29 15:00:50,905 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4012373.3333333335, ans=0.0 2023-11-29 15:00:50,998 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.06 vs. limit=15.0 2023-11-29 15:00:59,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4012373.3333333335, ans=0.125 2023-11-29 15:01:02,289 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4012440.0, ans=0.0 2023-11-29 15:01:16,299 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.37 vs. limit=15.0 2023-11-29 15:01:25,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4012506.6666666665, ans=0.0 2023-11-29 15:01:39,810 INFO [train_asr.py:1235] (1/4) Epoch 51, batch 700, loss[loss=0.06401, simple_loss=0.09602, pruned_loss=0.01006, audio_tagging_loss=0.005945, over 15475.00 frames. ], tot_loss[loss=0.0647, simple_loss=0.08895, pruned_loss=0.01147, audio_tagging_loss=0.008753, over 2963158.52 frames. ], batch size: 58, lr: 1.33e-03, grad_scale: 32.0 2023-11-29 15:01:42,854 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 601900 2023-11-29 15:02:08,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4012773.3333333335, ans=0.125 2023-11-29 15:02:10,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4012773.3333333335, ans=0.125 2023-11-29 15:02:15,551 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.121e+01 8.992e+01 9.586e+01 1.033e+02 1.329e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-29 15:02:22,297 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4012840.0, ans=0.0 2023-11-29 15:02:28,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4012906.6666666665, ans=0.2 2023-11-29 15:02:32,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4012906.6666666665, ans=0.1 2023-11-29 15:02:32,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4012906.6666666665, ans=0.125 2023-11-29 15:02:41,760 INFO [train_asr.py:1235] (1/4) Epoch 51, batch 750, loss[loss=0.06306, simple_loss=0.08096, pruned_loss=0.01277, audio_tagging_loss=0.009805, over 15766.00 frames. ], tot_loss[loss=0.06472, simple_loss=0.08877, pruned_loss=0.01161, audio_tagging_loss=0.008728, over 2978422.22 frames. ], batch size: 61, lr: 1.33e-03, grad_scale: 16.0 2023-11-29 15:02:44,201 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 601950 2023-11-29 15:02:45,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4012973.3333333335, ans=0.0 2023-11-29 15:02:52,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4012973.3333333335, ans=0.125 2023-11-29 15:03:12,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4013106.6666666665, ans=0.125 2023-11-29 15:03:18,381 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4013173.3333333335, ans=0.125 2023-11-29 15:03:44,107 INFO [train_asr.py:1235] (1/4) Epoch 51, batch 800, loss[loss=0.06729, simple_loss=0.08272, pruned_loss=0.01545, audio_tagging_loss=0.01047, over 14840.00 frames. ], tot_loss[loss=0.06494, simple_loss=0.08918, pruned_loss=0.01164, audio_tagging_loss=0.008708, over 2997513.28 frames. ], batch size: 56, lr: 1.33e-03, grad_scale: 32.0 2023-11-29 15:03:46,582 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 602000 2023-11-29 15:04:01,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4013373.3333333335, ans=0.07 2023-11-29 15:04:05,078 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 15:04:12,838 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=4013440.0, ans=0.0 2023-11-29 15:04:21,470 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.185e+01 9.281e+01 9.942e+01 1.059e+02 1.465e+02, threshold=1.988e+02, percent-clipped=0.0 2023-11-29 15:04:32,008 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.10 vs. limit=12.0 2023-11-29 15:04:44,343 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4013573.3333333335, ans=0.1 2023-11-29 15:04:46,312 INFO [train_asr.py:1235] (1/4) Epoch 51, batch 850, loss[loss=0.05704, simple_loss=0.07646, pruned_loss=0.00791, audio_tagging_loss=0.0109, over 14987.00 frames. ], tot_loss[loss=0.06462, simple_loss=0.08855, pruned_loss=0.01153, audio_tagging_loss=0.008814, over 3007254.97 frames. ], batch size: 56, lr: 1.33e-03, grad_scale: 32.0 2023-11-29 15:04:48,732 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 602050 2023-11-29 15:05:27,764 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.37 vs. limit=15.0 2023-11-29 15:05:37,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4013906.6666666665, ans=0.0 2023-11-29 15:05:48,331 INFO [train_asr.py:1235] (1/4) Epoch 51, batch 900, loss[loss=0.05768, simple_loss=0.07983, pruned_loss=0.007253, audio_tagging_loss=0.01051, over 16150.00 frames. ], tot_loss[loss=0.0649, simple_loss=0.0891, pruned_loss=0.01157, audio_tagging_loss=0.00878, over 3013326.54 frames. ], batch size: 59, lr: 1.33e-03, grad_scale: 16.0 2023-11-29 15:05:50,765 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 602100 2023-11-29 15:06:26,394 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.140e+01 9.010e+01 9.603e+01 1.027e+02 1.199e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-29 15:06:51,389 INFO [train_asr.py:1235] (1/4) Epoch 51, batch 950, loss[loss=0.06213, simple_loss=0.07615, pruned_loss=0.01177, audio_tagging_loss=0.01228, over 15021.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.0898, pruned_loss=0.01161, audio_tagging_loss=0.008775, over 3019136.51 frames. ], batch size: 57, lr: 1.33e-03, grad_scale: 16.0 2023-11-29 15:06:53,955 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 602150 2023-11-29 15:07:08,708 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4014373.3333333335, ans=0.125 2023-11-29 15:07:15,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4014440.0, ans=0.0 2023-11-29 15:07:47,789 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4014573.3333333335, ans=0.125 2023-11-29 15:07:53,389 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.30 vs. limit=15.0 2023-11-29 15:07:53,968 INFO [train_asr.py:1235] (1/4) Epoch 51, batch 1000, loss[loss=0.0724, simple_loss=0.1016, pruned_loss=0.01409, audio_tagging_loss=0.007514, over 15015.00 frames. ], tot_loss[loss=0.06522, simple_loss=0.08989, pruned_loss=0.01166, audio_tagging_loss=0.008615, over 3032988.23 frames. ], batch size: 56, lr: 1.33e-03, grad_scale: 16.0 2023-11-29 15:07:56,593 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 602200 2023-11-29 15:07:58,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4014640.0, ans=0.2 2023-11-29 15:08:04,251 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.31 vs. limit=15.0 2023-11-29 15:08:06,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4014706.6666666665, ans=0.125 2023-11-29 15:08:14,731 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4014706.6666666665, ans=0.0 2023-11-29 15:08:24,038 WARNING [train_asr.py:1481] (1/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 15:08:33,536 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.974e+01 9.222e+01 9.984e+01 1.098e+02 2.505e+02, threshold=1.997e+02, percent-clipped=1.0 2023-11-29 15:08:56,581 INFO [train_asr.py:1235] (1/4) Epoch 51, batch 1050, loss[loss=0.07685, simple_loss=0.1074, pruned_loss=0.0156, audio_tagging_loss=0.007572, over 14953.00 frames. ], tot_loss[loss=0.06511, simple_loss=0.08966, pruned_loss=0.01172, audio_tagging_loss=0.008554, over 3035219.92 frames. ], batch size: 57, lr: 1.33e-03, grad_scale: 16.0 2023-11-29 15:08:59,088 INFO [model.py:807] (1/4) Freeze_encoder: False; Current batch idx: 602250 2023-11-29 15:09:31,865 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 15:09:40,765 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.84 vs. limit=15.0 2023-11-29 15:09:44,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4015173.3333333335, ans=0.1 2023-11-29 15:09:45,108 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.63 vs. limit=6.0 2023-11-29 15:09:50,706 INFO [checkpoint.py:75] (1/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/bad-model-1.pt 2023-11-29 15:09:52,932 INFO [train_asr.py:1596] (1/4) Saving batch to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/batch-8cd272e0-909e-4060-8425-07646bc9947a.pt 2023-11-29 15:09:53,013 INFO [train_asr.py:1602] (1/4) features shape: torch.Size([56, 1776, 80]) 2023-11-29 15:09:53,023 INFO [train_asr.py:1606] (1/4) num tokens: 1904