YWZBrandon's picture
End of training
ecd3f08 verified
[2025-05-10 11:15:11] Created output directory: train_results_pred_mask/google_gemma-3-1b-pt_ds1000_upsample1000_predict_mask
[2025-05-10 11:15:11] Chat mode disabled
[2025-05-10 11:15:11] Model size is 3B or smaller (1 B). Using full fine-tuning.
[2025-05-10 11:15:11] No QA format data will be used
[2025-05-10 11:15:11] Limiting dataset size to: 1000 samples
[2025-05-10 11:15:12] =======================================
[2025-05-10 11:15:12] Starting training for model: google/gemma-3-1b-pt
[2025-05-10 11:15:12] =======================================
[2025-05-10 11:15:12] CUDA_VISIBLE_DEVICES: 0,1,2,3
[2025-05-10 11:15:12] WANDB_PROJECT: wikidyk-ar
[2025-05-10 11:15:12] DATA_PATH: data/wikidyk2022-2025_01082025_gpt-4o_evalv2_pages_formatted_combined_v2.json
[2025-05-10 11:15:12] Global Batch Size: 128
[2025-05-10 11:15:12] Data Size: 1000
[2025-05-10 11:15:12] Executing command: torchrun --nproc_per_node "4" --master-port 29501 src/train.py --model_name_or_path "google/gemma-3-1b-pt" --data_path "data/wikidyk2022-2025_01082025_gpt-4o_evalv2_pages_formatted_combined_v2.json" --output_dir "train_results_pred_mask/google_gemma-3-1b-pt_ds1000_upsample1000_predict_mask" --num_upsample "1000" --per_device_train_batch_size "32" --gradient_accumulation_steps "1" --learning_rate "2e-5" --num_train_epochs "1" --model_max_length "32768" --report_to wandb --logging_steps 50 --save_strategy no --bf16 True --use_flash_attention_2 True --qa_data_ratio "-1" --predict_mask "true" --ds_size 1000
[2025-05-10 11:15:12] Training started at Sat May 10 11:15:12 UTC 2025
[2025-05-10 11:15:13] ERROR: Training failed for google/gemma-3-1b-pt with exit code 1
[2025-05-10 11:15:13] ERROR: Training failed for google/gemma-3-1b-pt with exit code 1
[2025-05-10 11:15:13] Check error log for details: train_results_pred_mask/google_gemma-3-1b-pt_ds1000_upsample1000_predict_mask/20250510_111509.log
[2025-05-10 11:15:13] Resource usage after training google/gemma-3-1b-pt:
[2025-05-10 11:15:13] GPU memory usage:
3635 MiB, 40960 MiB
3615 MiB, 40960 MiB
3619 MiB, 40960 MiB
3611 MiB, 40960 MiB
[2025-05-10 11:15:13] Disk space usage for model outputs:
4.0K train_results_pred_mask/google_gemma-3-1b-pt_ds1000_upsample1000_predict_mask
[2025-05-10 11:15:13]