Finetuning guide? Supported audio formats while finetuning?
please tell me about finetuning this system, what is the VRAM requirement? In what format (preferably CSV,TSV) we provide audio paths+transcripts?
How to set the language for transcripts?
Hi @neurlang
The Canary training script takes dataset manifest as an input in a jsonl format. Our tutorial has details on how to create the manifest file and how to finetune the canary-flash models.
The Canary-180M-Flash was trained on 32 A100 80GB GPUs. Based on the size of your GPU, you can scale the batch size. The effective batch size can be controlled using trainer.accumulate_grad_batches
and the number of GPUs. Be sure to tune the learning rate accordingly.
Hope that helps, please feel free to reach out if you have more questions!
Is it possible to finetune just the vocabulary and language, comparable to training KenLM/n-gram language models in the older CTC models? It was quite neat to train on text only instead of audio and text.
There is the class BeamSearchSequenceGeneratorWithLanguageModel for example.
Could this be utilized to quickly fine tune the transcriptions to an expert domain?
@halbefn
you can try decoding with n-gram LM
It's available in the main branch.
For details (building and using LM), please see the description of the PR https://github.com/NVIDIA/NeMo/pull/12730
@artbataev
Thank you, it works quite well.
For anyone reading this, changing e.g. multitask_decoding.strategy="beam" to decoding.strategy="beam" allows you to use the KenLM models on longer audio files with https://github.com/NVIDIA/NeMo/blob/main/examples/asr/asr_chunked_inference/aed/speech_to_text_aed_chunked_infer.py
Edit: However, if you add "timestamps=True" to speech_to_text_aed_chunked_infer.py you get nonsense transcripts.
I am following the training from scratch guide here => https://github.com/NVIDIA/NeMo/blob/main/tutorials/asr/Canary_Multitask_Speech_Model.ipynb
GPU - RTX 3090
For a small dataset the training works just as expected.
When I use a 500 hours dataset which has audio files with duration > 10 seconds as well. I clip those to 5 seconds when building manifest. Always runs OOM with validation_ds sanity check so i had to disable it.
Then I train with this command
HYDRA_FULL_ERROR=1 python scripts/speech_to_text_aed.py \
--config-path="../config" \
--config-name="fast-conformer_aed.yaml" \
name="canary-small" \
model.prompt_format="canary2" \
model.train_ds.manifest_filepath="datasets/SPRING_INX_R1/valid_manifest.json" \
model.train_ds.batch_size=1 \
model.train_ds.batch_duration=30 \
model.validation_ds.manifest_filepath="datasets/SPRING_INX_R1/valid_manifest.json" \
model.test_ds.manifest_filepath="datasets/SPRING_INX_R1/valid_manifest.json" \
model.tokenizer.langs.hi.dir="tokenizers/hi_sprinx_1538/tokenizer_spe_bpe_v1538" \
model.tokenizer.langs.spl_tokens.dir="tokenizers/spl_tokens" \
spl_tokens.model_dir="tokenizers/spl_tokens" \
model.encoder.n_layers=2 \
model.transf_decoder.config_dict.num_layers=2 \
exp_manager.exp_dir="canary_results" \
exp_manager.resume_ignore_no_checkpoint=true \
trainer.max_steps=1000 \
trainer.log_every_n_steps=1 \
trainer.num_sanity_val_steps=0
and it still fails with this log
Using bfloat16 Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
[NeMo W 2025-05-02 07:25:39 nemo_logging:405] No version folders would be created under the log folder as 'resume_if_exists' is enabled.
[NeMo W 2025-05-02 07:25:39 nemo_logging:405] There were no checkpoints found in checkpoint_dir or no checkpoint folder at checkpoint_dir :canary_results/canary-small/checkpoints. Training from scratch.
[NeMo I 2025-05-02 07:25:39 nemo_logging:393] Experiments will be logged at canary_results/canary-small
[NeMo I 2025-05-02 07:25:39 nemo_logging:393] TensorboardLogger has been set up
[NeMo W 2025-05-02 07:25:39 nemo_logging:405] The checkpoint callback was told to monitor a validation value and trainer's max_steps was set to 1000. Please ensure that max_steps will run for at least 1 epochs to ensure that checkpointing will not error out.
[NeMo I 2025-05-02 07:25:39 nemo_logging:393] TFLOPs per sec per GPU will be calculated, conditioned on supported models. Defaults to -1 upon failure.
[NeMo I 2025-05-02 07:25:39 nemo_logging:393] Detected spl_tokens config. Building tokenizer.
[NeMo I 2025-05-02 07:25:39 nemo_logging:393] tokenizer model tokenizers/spl_tokens/tokenizer.model already exists
[NeMo I 2025-05-02 07:25:39 nemo_logging:393] _setup_tokenizer: detected an aggregate tokenizer
[NeMo I 2025-05-02 07:25:39 nemo_logging:393] Tokenizer SentencePieceTokenizer initialized with 1152 tokens
[NeMo I 2025-05-02 07:25:39 nemo_logging:393] Tokenizer SentencePieceTokenizer initialized with 1538 tokens
[NeMo I 2025-05-02 07:25:39 nemo_logging:393] Aggregate vocab size: 2690
[NeMo I 2025-05-02 07:25:40 nemo_logging:393] We will be using a Lhotse DataLoader.
Initializing Lhotse CutSet from a single NeMo manifest
(is_tarred=False): 'datasets/SPRING_INX_R1/valid_manifest.json'
[NeMo W 2025-05-02 07:25:40 nemo_logging:405] You are using a non-tarred dataset and requested tokenization during data sampling (pretokenize=True). This will cause the tokenization to happen in the main (GPU) process,possibly impacting the training speed if your tokenizer is very large.If the impact is noticable, set pretokenize=False in dataloader config.(note: that will disable token-per-second filtering and 2D bucketing features)
[NeMo I 2025-05-02 07:25:40 nemo_logging:393] Creating a Lhotse DynamicBucketingSampler (max_batch_duration=30.0 max_batch_size=1)
[NeMo I 2025-05-02 07:25:41 nemo_logging:393] We will be using a Lhotse DataLoader.
[NeMo W 2025-05-02 07:25:41 nemo_logging:405] The following configuration keys are ignored by Lhotse dataloader: use_start_end_token
Initializing Lhotse CutSet from a single NeMo manifest
(is_tarred=False): 'datasets/SPRING_INX_R1/valid_manifest.json'
[NeMo W 2025-05-02 07:25:41 nemo_logging:405] You are using a non-tarred dataset and requested tokenization during data sampling (pretokenize=True). This will cause the tokenization to happen in the main (GPU) process,possibly impacting the training speed if your tokenizer is very large.If the impact is noticable, set pretokenize=False in dataloader config.(note: that will disable token-per-second filtering and 2D bucketing features)
[NeMo I 2025-05-02 07:25:41 nemo_logging:393] Creating a Lhotse DynamicCutSampler (bucketing is disabled, (max_batch_duration=None max_batch_size=1)
[NeMo I 2025-05-02 07:25:41 nemo_logging:393] We will be using a Lhotse DataLoader.
[NeMo W 2025-05-02 07:25:41 nemo_logging:405] The following configuration keys are ignored by Lhotse dataloader: use_start_end_token
Initializing Lhotse CutSet from a single NeMo manifest
(is_tarred=False): 'datasets/SPRING_INX_R1/valid_manifest.json'
[NeMo W 2025-05-02 07:25:41 nemo_logging:405] You are using a non-tarred dataset and requested tokenization during data sampling (pretokenize=True). This will cause the tokenization to happen in the main (GPU) process,possibly impacting the training speed if your tokenizer is very large.If the impact is noticable, set pretokenize=False in dataloader config.(note: that will disable token-per-second filtering and 2D bucketing features)
[NeMo I 2025-05-02 07:25:41 nemo_logging:393] Creating a Lhotse DynamicCutSampler (bucketing is disabled, (max_batch_duration=None max_batch_size=8)
[NeMo I 2025-05-02 07:25:41 nemo_logging:393] PADDING: 0
You are using a CUDA device ('NVIDIA GeForce RTX 3090') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1
----------------------------------------------------------------------------------------------------
distributed_backend=nccl
All distributed processes registered. Starting with 1 processes
----------------------------------------------------------------------------------------------------
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
[NeMo I 2025-05-02 07:25:43 nemo_logging:393] Optimizer config = AdamW (
Parameter Group 0
amsgrad: False
betas: [0.9, 0.98]
capturable: False
decoupled_weight_decay: True
differentiable: False
eps: 1e-08
foreach: None
fused: None
lr: 0.0003
maximize: False
weight_decay: 0.001
)
[NeMo I 2025-05-02 07:25:43 nemo_logging:393] Scheduler "<nemo.core.optim.lr_scheduler.InverseSquareRootAnnealing object at 0x7f20e6b1ff50>"
will be used during training (effective maximum steps = 1000) -
Parameters :
(warmup_steps: 2500
warmup_ratio: null
min_lr: 1.0e-06
max_steps: 1000
)
| Name | Type | Params | Mode
-----------------------------------------------------------------------------------
0 | preprocessor | AudioToMelSpectrogramPreprocessor | 0 | train
1 | encoder | ConformerEncoder | 54.8 M | train
2 | encoder_decoder_proj | Identity | 0 | train
3 | transf_decoder | TransformerDecoderNM | 36.4 M | train
4 | log_softmax | TokenClassifier | 2.8 M | train
5 | loss | SmoothedCrossEntropyLoss | 0 | train
6 | spec_augmentation | SpectrogramAugmentation | 0 | train
7 | val_loss | GlobalAverageLossMetric | 0 | train
8 | wer | WER | 0 | train
9 | bleu | BLEU | 0 | train
-----------------------------------------------------------------------------------
91.1 M Trainable params
0 Non-trainable params
91.1 M Total params
364.444 Total estimated model params size (MB)
136 Modules in train mode
0 Modules in eval mode
[NeMo W 2025-05-02 07:25:46 nemo_logging:405] /opt/conda/envs/nemo/lib/python3.11/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:424: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=7` in the `DataLoader` to improve performance.
Epoch 0: | | 0/? [00:00<?, ?it/s]Error executing job with overrides: ['name=canary-small', 'model.prompt_format=canary2', 'model.train_ds.manifest_filepath=datasets/SPRING_INX_R1/valid_manifest.json', 'model.validation_ds.manifest_filepath=datasets/SPRING_INX_R1/valid_manifest.json', 'model.test_ds.manifest_filepath=datasets/SPRING_INX_R1/valid_manifest.json', 'model.tokenizer.langs.hi.dir=tokenizers/hi_sprinx_1538/tokenizer_spe_bpe_v1538', 'model.tokenizer.langs.spl_tokens.dir=tokenizers/spl_tokens', 'spl_tokens.model_dir=tokenizers/spl_tokens', 'model.encoder.n_layers=2', 'model.transf_decoder.config_dict.num_layers=2', 'exp_manager.exp_dir=canary_results', 'exp_manager.resume_ignore_no_checkpoint=true', 'trainer.max_steps=1000', 'trainer.log_every_n_steps=1', 'trainer.num_sanity_val_steps=0']
[rank0]: Traceback (most recent call last):
[rank0]: File "/home/nemo_train/scripts/speech_to_text_aed.py", line 92, in <module>
[rank0]: main()
[rank0]: File "/home/NeMo/nemo/core/config/hydra_runner.py", line 129, in wrapper
[rank0]: _run_hydra(
[rank0]: File "/opt/conda/envs/nemo/lib/python3.11/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra
[rank0]: _run_app(
[rank0]: File "/opt/conda/envs/nemo/lib/python3.11/site-packages/hydra/_internal/utils.py", line 457, in _run_app
[rank0]: run_and_report(
[rank0]: File "/opt/conda/envs/nemo/lib/python3.11/site-packages/hydra/_internal/utils.py", line 223, in run_and_report
[rank0]: raise ex
[rank0]: File "/opt/conda/envs/nemo/lib/python3.11/site-packages/hydra/_internal/utils.py", line 220, in run_and_report
[rank0]: return func()
[rank0]: ^^^^^^
[rank0]: File "/opt/conda/envs/nemo/lib/python3.11/site-packages/hydra/_internal/utils.py", line 458, in <lambda>
[rank0]: lambda: hydra.run(
[rank0]: ^^^^^^^^^^
[rank0]: File "/opt/conda/envs/nemo/lib/python3.11/site-packages/hydra/_internal/hydra.py", line 132, in run
[rank0]: _ = ret.return_value
[rank0]: ^^^^^^^^^^^^^^^^
[rank0]: File "/opt/conda/envs/nemo/lib/python3.11/site-packages/hydra/core/utils.py", line 260, in return_value
[rank0]: raise self._return_value
[rank0]: File "/opt/conda/envs/nemo/lib/python3.11/site-packages/hydra/core/utils.py", line 186, in run_job
[rank0]: ret.return_value = task_function(task_cfg)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/nemo_train/scripts/speech_to_text_aed.py", line 84, in main
[rank0]: trainer.fit(aed_model)
[rank0]: File "/opt/conda/envs/nemo/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py", line 538, in fit
[rank0]: call._call_and_handle_interrupt(
[rank0]: File "/opt/conda/envs/nemo/lib/python3.11/site-packages/lightning/pytorch/trainer/call.py", line 46, in _call_and_handle_interrupt
[rank0]: return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/opt/conda/envs/nemo/lib/python3.11/site-packages/lightning/pytorch/strategies/launchers/subprocess_script.py", line 105, in launch
[rank0]: return function(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/opt/conda/envs/nemo/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py", line 574, in _fit_impl
[rank0]: self._run(model, ckpt_path=ckpt_path)
[rank0]: File "/opt/conda/envs/nemo/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py", line 981, in _run
[rank0]: results = self._run_stage()
[rank0]: ^^^^^^^^^^^^^^^^^
[rank0]: File "/opt/conda/envs/nemo/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py", line 1025, in _run_stage
[rank0]: self.fit_loop.run()
[rank0]: File "/opt/conda/envs/nemo/lib/python3.11/site-packages/lightning/pytorch/loops/fit_loop.py", line 205, in run
[rank0]: self.advance()
[rank0]: File "/opt/conda/envs/nemo/lib/python3.11/site-packages/lightning/pytorch/loops/fit_loop.py", line 363, in advance
[rank0]: self.epoch_loop.run(self._data_fetcher)
[rank0]: File "/opt/conda/envs/nemo/lib/python3.11/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 140, in run
[rank0]: self.advance(data_fetcher)
[rank0]: File "/opt/conda/envs/nemo/lib/python3.11/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 250, in advance
[rank0]: batch_output = self.automatic_optimization.run(trainer.optimizers[0], batch_idx, kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/opt/conda/envs/nemo/lib/python3.11/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 190, in run
[rank0]: self._optimizer_step(batch_idx, closure)
[rank0]: File "/opt/conda/envs/nemo/lib/python3.11/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 268, in _optimizer_step
[rank0]: call._call_lightning_module_hook(
[rank0]: File "/opt/conda/envs/nemo/lib/python3.11/site-packages/lightning/pytorch/trainer/call.py", line 167, in _call_lightning_module_hook
[rank0]: output = fn(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^
[rank0]: File "/opt/conda/envs/nemo/lib/python3.11/site-packages/lightning/pytorch/core/module.py", line 1306, in optimizer_step
[rank0]: optimizer.step(closure=optimizer_closure)
[rank0]: File "/opt/conda/envs/nemo/lib/python3.11/site-packages/lightning/pytorch/core/optimizer.py", line 153, in step
[rank0]: step_output = self._strategy.optimizer_step(self._optimizer, closure, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/opt/conda/envs/nemo/lib/python3.11/site-packages/lightning/pytorch/strategies/ddp.py", line 270, in optimizer_step
[rank0]: optimizer_output = super().optimizer_step(optimizer, closure, model, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/opt/conda/envs/nemo/lib/python3.11/site-packages/lightning/pytorch/strategies/strategy.py", line 238, in optimizer_step
[rank0]: return self.precision_plugin.optimizer_step(optimizer, model=model, closure=closure, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/opt/conda/envs/nemo/lib/python3.11/site-packages/lightning/pytorch/plugins/precision/amp.py", line 75, in optimizer_step
[rank0]: return super().optimizer_step(optimizer, model=model, closure=closure, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/opt/conda/envs/nemo/lib/python3.11/site-packages/lightning/pytorch/plugins/precision/precision.py", line 122, in optimizer_step
[rank0]: return optimizer.step(closure=closure, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/opt/conda/envs/nemo/lib/python3.11/site-packages/torch/optim/lr_scheduler.py", line 124, in wrapper
[rank0]: return func.__get__(opt, opt.__class__)(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/opt/conda/envs/nemo/lib/python3.11/site-packages/torch/optim/optimizer.py", line 485, in wrapper
[rank0]: out = func(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/opt/conda/envs/nemo/lib/python3.11/site-packages/torch/optim/optimizer.py", line 79, in _use_grad
[rank0]: ret = func(self, *args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/opt/conda/envs/nemo/lib/python3.11/site-packages/torch/optim/adam.py", line 225, in step
[rank0]: loss = closure()
[rank0]: ^^^^^^^^^
[rank0]: File "/opt/conda/envs/nemo/lib/python3.11/site-packages/lightning/pytorch/plugins/precision/precision.py", line 108, in _wrap_closure
[rank0]: closure_result = closure()
[rank0]: ^^^^^^^^^
[rank0]: File "/opt/conda/envs/nemo/lib/python3.11/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 144, in __call__
[rank0]: self._result = self.closure(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/opt/conda/envs/nemo/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/opt/conda/envs/nemo/lib/python3.11/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 129, in closure
[rank0]: step_output = self._step_fn()
[rank0]: ^^^^^^^^^^^^^^^
[rank0]: File "/opt/conda/envs/nemo/lib/python3.11/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 317, in _training_step
[rank0]: training_step_output = call._call_strategy_hook(trainer, "training_step", *kwargs.values())
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/opt/conda/envs/nemo/lib/python3.11/site-packages/lightning/pytorch/trainer/call.py", line 319, in _call_strategy_hook
[rank0]: output = fn(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^
[rank0]: File "/opt/conda/envs/nemo/lib/python3.11/site-packages/lightning/pytorch/strategies/strategy.py", line 389, in training_step
[rank0]: return self._forward_redirection(self.model, self.lightning_module, "training_step", *args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/opt/conda/envs/nemo/lib/python3.11/site-packages/lightning/pytorch/strategies/strategy.py", line 640, in __call__
[rank0]: wrapper_output = wrapper_module(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/opt/conda/envs/nemo/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/opt/conda/envs/nemo/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/opt/conda/envs/nemo/lib/python3.11/site-packages/torch/nn/parallel/distributed.py", line 1637, in forward
[rank0]: else self._run_ddp_forward(*inputs, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/opt/conda/envs/nemo/lib/python3.11/site-packages/torch/nn/parallel/distributed.py", line 1464, in _run_ddp_forward
[rank0]: return self.module(*inputs, **kwargs) # type: ignore[index]
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/opt/conda/envs/nemo/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/opt/conda/envs/nemo/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/opt/conda/envs/nemo/lib/python3.11/site-packages/lightning/pytorch/strategies/strategy.py", line 633, in wrapped_forward
[rank0]: out = method(*_args, **_kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/NeMo/nemo/utils/model_utils.py", line 477, in wrap_training_step
[rank0]: output_dict = wrapped(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/NeMo/nemo/collections/asr/models/aed_multitask_models.py", line 715, in training_step
[rank0]: transf_log_probs, encoded_len, enc_states, enc_mask = self.forward(
[rank0]: ^^^^^^^^^^^^^
[rank0]: File "/home/NeMo/nemo/core/classes/common.py", line 1081, in wrapped_call
[rank0]: outputs = wrapped(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/NeMo/nemo/collections/asr/models/aed_multitask_models.py", line 683, in forward
[rank0]: encoded, encoded_len = self.encoder(audio_signal=processed_signal, length=processed_signal_length)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/opt/conda/envs/nemo/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/opt/conda/envs/nemo/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/NeMo/nemo/core/classes/common.py", line 1081, in wrapped_call
[rank0]: outputs = wrapped(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/NeMo/nemo/collections/asr/modules/conformer_encoder.py", line 523, in forward
[rank0]: return self.forward_internal(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/NeMo/nemo/collections/asr/modules/conformer_encoder.py", line 601, in forward_internal
[rank0]: audio_signal = layer(
[rank0]: ^^^^^^
[rank0]: File "/opt/conda/envs/nemo/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/opt/conda/envs/nemo/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/NeMo/nemo/collections/asr/parts/submodules/conformer_modules.py", line 181, in forward
[rank0]: x = self.self_attn(query=x, key=x, value=x, mask=att_mask, pos_emb=pos_emb, cache=cache_last_channel)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/opt/conda/envs/nemo/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/opt/conda/envs/nemo/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/NeMo/nemo/collections/asr/parts/submodules/multi_head_attention.py", line 314, in forward
[rank0]: matrix_bd = self.rel_shift(matrix_bd)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/NeMo/nemo/collections/asr/parts/submodules/multi_head_attention.py", line 266, in rel_shift
[rank0]: x = torch.nn.functional.pad(x, pad=(1, 0)) # (b, h, t1, t2+1)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/opt/conda/envs/nemo/lib/python3.11/site-packages/torch/nn/functional.py", line 5209, in pad
[rank0]: return torch._C._nn.pad(input, pad, mode, value)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 11.66 GiB. GPU 0 has a total capacity of 23.57 GiB of which 3.15 GiB is free. Including non-PyTorch memory, this process has 20.41 GiB memory in use. Of the allocated memory 19.60 GiB is allocated by PyTorch, and 55.93 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)