how many GPU memory do I need to finetune largeV3
I am trying to finetune largeV3 using transformers code: torchrun --nproc_per_node 3 whisper_transformer_001.py --model_name_or_path="/home/lane/ai/models/openai/whisper/largeV3" --dataset_name="mozilla-foundation/common_voice_2_0" --dataset_config_name="zh-CN" --language="Chinese" --task="transcribe" --train_split_name="train+validation" --eval_split_name="test" --max_steps="400" --output_dir="/home/lane/ai/models/openai/whisper/largeV3-Chinese" --per_device_train_batch_size="1" --per_device_eval_batch_size="1" --logging_steps="5" --learning_rate="1e-5" --warmup_steps="40" --eval_strategy="steps" --eval_steps="100" --save_strategy="steps" --save_steps="100" --generation_max_length="95" --preprocessing_num_workers="8" --max_duration_in_seconds="30" --text_column_name="sentence" --freeze_feature_encoder="False" --gradient_checkpointing --fp16 --overwrite_output_dir --do_train --do_eval --predict_with_generate.
I have 3 GPU with 22G memory for each. but each time I encountered CUDA OUT OF MEMORY.
dose any one have finetuned largeV3? how do you do that, with how much GPU memory? thanks.
Was able to finetune large-v3 model on an A100. Max GPU memory consumption was 32GB. per_device_train_batch_size=2
.
Was able to finetune large-v3 model on an A100. Max GPU memory consumption was 32GB.
per_device_train_batch_size=2
.
thanks a lot for the sharing
Was able to finetune large-v3 model on an A100. Max GPU memory consumption was 32GB.
per_device_train_batch_size=2
.
Can you share your training config please? Are you using fp16, bf16 or fp32? And what optimizer?
@metricv
Was able to finetune large-v3 model on an A100. Max GPU memory consumption was 32GB.
per_device_train_batch_size=2
.Can you share your training config please? Are you using fp16, bf16 or fp32? And what optimizer?
@metricv
I wrote a blog: https://me.sakana.moe/2024/09/03/a-complete-guide-to-fine-tuning-and-deploying-whisper-models/