|
[2025-05-10 11:15:25] Created output directory: train_results_pred_mask/google_gemma-3-1b-pt_ds100_upsample1000_predict_mask |
|
[2025-05-10 11:15:25] Chat mode disabled |
|
[2025-05-10 11:15:25] Model size is 3B or smaller (1 B). Using full fine-tuning. |
|
[2025-05-10 11:15:25] No QA format data will be used |
|
[2025-05-10 11:15:25] Limiting dataset size to: 100 samples |
|
[2025-05-10 11:15:25] ======================================= |
|
[2025-05-10 11:15:25] Starting training for model: google/gemma-3-1b-pt |
|
[2025-05-10 11:15:25] ======================================= |
|
[2025-05-10 11:15:25] CUDA_VISIBLE_DEVICES: 0,1,2,3 |
|
[2025-05-10 11:15:25] WANDB_PROJECT: wikidyk-ar |
|
[2025-05-10 11:15:25] DATA_PATH: data/wikidyk2022-2025_01082025_gpt-4o_evalv2_pages_formatted_combined_v2.json |
|
[2025-05-10 11:15:25] Global Batch Size: 128 |
|
[2025-05-10 11:15:25] Data Size: 100 |
|
[2025-05-10 11:15:25] Executing command: torchrun --nproc_per_node "4" --master-port 29506 src/train.py --model_name_or_path "google/gemma-3-1b-pt" --data_path "data/wikidyk2022-2025_01082025_gpt-4o_evalv2_pages_formatted_combined_v2.json" --output_dir "train_results_pred_mask/google_gemma-3-1b-pt_ds100_upsample1000_predict_mask" --num_upsample "1000" --per_device_train_batch_size "32" --gradient_accumulation_steps "1" --learning_rate "2e-5" --num_train_epochs "1" --model_max_length "32768" --report_to wandb --logging_steps 50 --save_strategy no --bf16 True --use_flash_attention_2 True --qa_data_ratio "-1" --predict_mask "true" --ds_size 100 |
|
[2025-05-10 11:15:25] Training started at Sat May 10 11:15:25 UTC 2025 |
|
W0510 11:15:27.277000 361433 site-packages/torch/distributed/run.py:792] |
|
W0510 11:15:27.277000 361433 site-packages/torch/distributed/run.py:792] ***************************************** |
|
W0510 11:15:27.277000 361433 site-packages/torch/distributed/run.py:792] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. |
|
W0510 11:15:27.277000 361433 site-packages/torch/distributed/run.py:792] ***************************************** |
|
WARNING:root:Output directory: train_results_pred_mask/google_gemma-3-1b-pt_ds100_upsample1000_predict_mask |
|
The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use `attn_implementation="flash_attention_2"` instead. |
|
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. |
|
WARNING:root:Output directory: train_results_pred_mask/google_gemma-3-1b-pt_ds100_upsample1000_predict_mask |
|
WARNING:root:Output directory: train_results_pred_mask/google_gemma-3-1b-pt_ds100_upsample1000_predict_mask |
|
The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use `attn_implementation="flash_attention_2"` instead. |
|
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. |
|
WARNING:root:Output directory: train_results_pred_mask/google_gemma-3-1b-pt_ds100_upsample1000_predict_mask |
|
The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use `attn_implementation="flash_attention_2"` instead. |
|
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. |
|
The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use `attn_implementation="flash_attention_2"` instead. |
|
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. |
|
WARNING:root:Loading data... |
|
WARNING:root:Loading data... |
|
WARNING:root:Loading data... |
|
WARNING:root:Loading data... |
|
WARNING:root:Dataset initialized with all QA data: |
|
WARNING:root: - 0 QA examples |
|
WARNING:root: - 100 fact examples with upsampling factor 1000 |
|
WARNING:root: - Total examples: 100000 |
|
/root/WikiDYKEvalV2/src/train.py:119: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead. |
|
trainer = Trainer(model=model, tokenizer=tokenizer, args=training_args, **data_module) |
|
WARNING:root:Dataset initialized with all QA data: |
|
WARNING:root: - 0 QA examples |
|
WARNING:root: - 100 fact examples with upsampling factor 1000 |
|
WARNING:root: - Total examples: 100000 |
|
/root/WikiDYKEvalV2/src/train.py:119: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead. |
|
trainer = Trainer(model=model, tokenizer=tokenizer, args=training_args, **data_module) |
|
WARNING:root:Dataset initialized with all QA data: |
|
WARNING:root: - 0 QA examples |
|
WARNING:root: - 100 fact examples with upsampling factor 1000 |
|
WARNING:root: - Total examples: 100000 |
|
/root/WikiDYKEvalV2/src/train.py:119: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead. |
|
trainer = Trainer(model=model, tokenizer=tokenizer, args=training_args, **data_module) |
|
WARNING:root:Dataset initialized with all QA data: |
|
WARNING:root: - 0 QA examples |
|
WARNING:root: - 100 fact examples with upsampling factor 1000 |
|
WARNING:root: - Total examples: 100000 |
|
/root/WikiDYKEvalV2/src/train.py:119: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead. |
|
trainer = Trainer(model=model, tokenizer=tokenizer, args=training_args, **data_module) |
|
wandb: WARNING The `run_name` is currently set to the same value as `TrainingArguments.output_dir`. If this was not intended, please specify a different run name by setting the `TrainingArguments.run_name` parameter. |
|
It is strongly recommended to train Gemma3 models with the `eager` attention implementation instead of `flash_attention_2`. Use `eager` with `AutoModelForCausalLM.from_pretrained('<path-to-checkpoint>', attn_implementation='eager')`. |
|
It is strongly recommended to train Gemma3 models with the `eager` attention implementation instead of `flash_attention_2`. Use `eager` with `AutoModelForCausalLM.from_pretrained('<path-to-checkpoint>', attn_implementation='eager')`. |
|
It is strongly recommended to train Gemma3 models with the `eager` attention implementation instead of `flash_attention_2`. Use `eager` with `AutoModelForCausalLM.from_pretrained('<path-to-checkpoint>', attn_implementation='eager')`. |
|
wandb: Currently logged in as: yuweiz to https://api.wandb.ai. Use `wandb login --relogin` to force relogin |
|
wandb: Tracking run with wandb version 0.19.10 |
|
wandb: Run data is saved locally in /root/WikiDYKEvalV2/wandb/run-20250510_111541-7crva42l |
|
wandb: Run `wandb offline` to turn off syncing. |
|
wandb: Syncing run train_results_pred_mask/google_gemma-3-1b-pt_ds100_upsample1000_predict_mask |
|
wandb: βοΈ View project at https://wandb.ai/yuweiz/wikidyk-ar |
|
wandb: π View run at https://wandb.ai/yuweiz/wikidyk-ar/runs/7crva42l |
|
0%| | 0/782 [00:00<?, ?it/s]It is strongly recommended to train Gemma3 models with the `eager` attention implementation instead of `flash_attention_2`. Use `eager` with `AutoModelForCausalLM.from_pretrained('<path-to-checkpoint>', attn_implementation='eager')`. |
|
[rank1]:[W510 11:15:42.657900696 reducer.cpp:1400] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) |
|
[rank2]:[W510 11:15:42.662607711 reducer.cpp:1400] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) |
|
[rank3]:[W510 11:15:42.675341163 reducer.cpp:1400] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) |
|
[rank0]:[W510 11:15:42.736902974 reducer.cpp:1400] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) |
|
0%| | 1/782 [00:01<17:43, 1.36s/it]
0%| | 2/782 [00:02<12:40, 1.03it/s]
0%| | 3/782 [00:02<11:14, 1.15it/s]
1%| | 4/782 [00:03<10:49, 1.20it/s]
1%| | 5/782 [00:04<10:32, 1.23it/s]
1%| | 6/782 [00:05<10:16, 1.26it/s]
1%| | 7/782 [00:05<10:06, 1.28it/s]
1%| | 8/782 [00:06<10:01, 1.29it/s]
1%| | 9/782 [00:07<09:53, 1.30it/s]
1%|β | 10/782 [00:08<09:54, 1.30it/s]
1%|β | 11/782 [00:08<09:53, 1.30it/s]
2%|β | 12/782 [00:09<09:44, 1.32it/s]
2%|β | 13/782 [00:10<09:49, 1.31it/s]
2%|β | 14/782 [00:11<09:46, 1.31it/s]
2%|β | 15/782 [00:11<09:44, 1.31it/s]
2%|β | 16/782 [00:12<09:41, 1.32it/s]
2%|β | 17/782 [00:13<09:42, 1.31it/s]
2%|β | 18/782 [00:14<09:32, 1.34it/s]
2%|β | 19/782 [00:14<09:38, 1.32it/s]
3%|β | 20/782 [00:15<09:37, 1.32it/s]
3%|β | 21/782 [00:16<09:24, 1.35it/s]
3%|β | 22/782 [00:17<09:21, 1.35it/s]
3%|β | 23/782 [00:17<09:26, 1.34it/s]
3%|β | 24/782 [00:18<09:31, 1.33it/s]
3%|β | 25/782 [00:19<10:07, 1.25it/s]
3%|β | 26/782 [00:20<10:07, 1.24it/s]
3%|β | 27/782 [00:21<10:01, 1.26it/s]
4%|β | 28/782 [00:22<09:57, 1.26it/s]
4%|β | 29/782 [00:22<09:55, 1.26it/s]
4%|β | 30/782 [00:23<09:47, 1.28it/s]
4%|β | 31/782 [00:24<10:49, 1.16it/s]
4%|β | 32/782 [00:25<10:28, 1.19it/s]
4%|β | 33/782 [00:26<10:11, 1.22it/s]
4%|β | 34/782 [00:26<09:47, 1.27it/s]
4%|β | 35/782 [00:27<09:43, 1.28it/s]
5%|β | 36/782 [00:28<09:44, 1.28it/s]
5%|β | 37/782 [00:29<09:43, 1.28it/s]
5%|β | 38/782 [00:29<09:39, 1.28it/s]
5%|β | 39/782 [00:30<09:34, 1.29it/s]
5%|β | 40/782 [00:31<09:33, 1.29it/s]
5%|β | 41/782 [00:32<09:25, 1.31it/s]
5%|β | 42/782 [00:33<09:24, 1.31it/s]
5%|β | 43/782 [00:33<09:25, 1.31it/s]
6%|β | 44/782 [00:34<09:29, 1.30it/s]
6%|β | 45/782 [00:35<09:32, 1.29it/s]
6%|β | 46/782 [00:36<09:26, 1.30it/s]
6%|β | 47/782 [00:36<09:31, 1.29it/s]
6%|β | 48/782 [00:37<09:30, 1.29it/s]
6%|β | 49/782 [00:38<09:21, 1.30it/s]
6%|β | 50/782 [00:39<09:22, 1.30it/s]
{'loss': 3.1431, 'grad_norm': 8.1875, 'learning_rate': 1.874680306905371e-05, 'epoch': 0.06} |
|
6%|β | 50/782 [00:39<09:22, 1.30it/s]
7%|β | 51/782 [00:39<09:18, 1.31it/s]
7%|β | 52/782 [00:40<09:20, 1.30it/s]
7%|β | 53/782 [00:41<09:23, 1.29it/s]
7%|β | 54/782 [00:42<09:12, 1.32it/s]
7%|β | 55/782 [00:43<09:15, 1.31it/s]
7%|β | 56/782 [00:43<09:05, 1.33it/s]
7%|β | 57/782 [00:44<09:06, 1.33it/s]
7%|β | 58/782 [00:45<09:10, 1.32it/s]
8%|β | 59/782 [00:46<09:10, 1.31it/s]
8%|β | 60/782 [00:46<09:15, 1.30it/s]
8%|β | 61/782 [00:47<09:18, 1.29it/s]
8%|β | 62/782 [00:48<09:11, 1.31it/s]
8%|β | 63/782 [00:49<09:04, 1.32it/s]
8%|β | 64/782 [00:49<09:03, 1.32it/s]
8%|β | 65/782 [00:50<09:12, 1.30it/s]
8%|β | 66/782 [00:51<09:05, 1.31it/s]
9%|β | 67/782 [00:52<09:06, 1.31it/s]
9%|β | 68/782 [00:52<08:59, 1.32it/s]
9%|β | 69/782 [00:53<08:52, 1.34it/s]
9%|β | 70/782 [00:54<08:53, 1.34it/s]
9%|β | 71/782 [00:55<08:55, 1.33it/s]
9%|β | 72/782 [00:55<09:00, 1.31it/s]
9%|β | 73/782 [00:56<09:03, 1.30it/s]
9%|β | 74/782 [00:57<09:05, 1.30it/s]
10%|β | 75/782 [00:58<08:57, 1.31it/s]
10%|β | 76/782 [00:59<09:05, 1.29it/s]
10%|β | 77/782 [00:59<09:03, 1.30it/s]
10%|β | 78/782 [01:00<09:02, 1.30it/s]
10%|β | 79/782 [01:01<09:02, 1.30it/s]
10%|β | 80/782 [01:02<09:04, 1.29it/s]
10%|β | 81/782 [01:02<08:59, 1.30it/s]
10%|β | 82/782 [01:03<08:57, 1.30it/s]
11%|β | 83/782 [01:04<08:58, 1.30it/s]
11%|β | 84/782 [01:05<08:54, 1.31it/s]
11%|β | 85/782 [01:05<08:58, 1.29it/s]
11%|β | 86/782 [01:06<09:06, 1.27it/s]
11%|β | 87/782 [01:07<08:57, 1.29it/s]
11%|ββ | 88/782 [01:08<08:52, 1.30it/s]
11%|ββ | 89/782 [01:09<08:48, 1.31it/s]
12%|ββ | 90/782 [01:09<08:55, 1.29it/s]
12%|ββ | 91/782 [01:10<08:54, 1.29it/s]
12%|ββ | 92/782 [01:11<08:54, 1.29it/s]
12%|ββ | 93/782 [01:12<08:54, 1.29it/s]
12%|ββ | 94/782 [01:12<08:55, 1.29it/s]
12%|ββ | 95/782 [01:13<08:50, 1.30it/s]
12%|ββ | 96/782 [01:14<08:52, 1.29it/s]
12%|ββ | 97/782 [01:15<08:49, 1.29it/s]
13%|ββ | 98/782 [01:16<08:50, 1.29it/s]
13%|ββ | 99/782 [01:16<08:49, 1.29it/s]
13%|ββ | 100/782 [01:17<08:48, 1.29it/s]
13%|ββ | 100/782 [01:17<08:48, 1.29it/s]{'loss': 0.1045, 'grad_norm': 3.90625, 'learning_rate': 1.7468030690537086e-05, 'epoch': 0.13} |
|
13%|ββ | 101/782 [01:18<08:47, 1.29it/s]
13%|ββ | 102/782 [01:19<08:41, 1.30it/s]
13%|ββ | 103/782 [01:19<08:45, 1.29it/s]
13%|ββ | 104/782 [01:20<08:38, 1.31it/s]
13%|ββ | 105/782 [01:21<08:28, 1.33it/s]
14%|ββ | 106/782 [01:22<08:36, 1.31it/s]
14%|ββ | 107/782 [01:22<08:37, 1.30it/s]
14%|ββ | 108/782 [01:23<08:30, 1.32it/s]
14%|ββ | 109/782 [01:24<08:29, 1.32it/s]
14%|ββ | 110/782 [01:25<08:35, 1.30it/s]
14%|ββ | 111/782 [01:25<08:25, 1.33it/s]
14%|ββ | 112/782 [01:26<08:18, 1.35it/s]
14%|ββ | 113/782 [01:27<08:11, 1.36it/s]
15%|ββ | 114/782 [01:28<08:13, 1.35it/s]
15%|ββ | 115/782 [01:28<08:11, 1.36it/s]
15%|ββ | 116/782 [01:29<08:16, 1.34it/s]
15%|ββ | 117/782 [01:30<08:10, 1.36it/s]
15%|ββ | 118/782 [01:31<08:12, 1.35it/s]
15%|ββ | 119/782 [01:31<08:10, 1.35it/s]
15%|ββ | 120/782 [01:32<08:20, 1.32it/s]
15%|ββ | 121/782 [01:33<08:16, 1.33it/s]
16%|ββ | 122/782 [01:34<08:17, 1.33it/s]
16%|ββ | 123/782 [01:34<08:22, 1.31it/s]
16%|ββ | 124/782 [01:35<08:26, 1.30it/s]
16%|ββ | 125/782 [01:36<08:20, 1.31it/s]
16%|ββ | 126/782 [01:37<08:23, 1.30it/s]
16%|ββ | 127/782 [01:37<08:17, 1.32it/s]
16%|ββ | 128/782 [01:38<08:20, 1.31it/s]
16%|ββ | 129/782 [01:39<08:10, 1.33it/s]
17%|ββ | 130/782 [01:40<08:06, 1.34it/s]
17%|ββ | 131/782 [01:40<08:14, 1.32it/s]
17%|ββ | 132/782 [01:41<08:05, 1.34it/s]
17%|ββ | 133/782 [01:42<08:13, 1.32it/s]
17%|ββ | 134/782 [01:43<08:10, 1.32it/s]
17%|ββ | 135/782 [01:43<08:07, 1.33it/s]
17%|ββ | 136/782 [01:44<08:09, 1.32it/s]
18%|ββ | 137/782 [01:45<08:11, 1.31it/s]
18%|ββ | 138/782 [01:46<08:17, 1.29it/s]
18%|ββ | 139/782 [01:47<08:17, 1.29it/s]
18%|ββ | 140/782 [01:47<08:15, 1.30it/s]
18%|ββ | 141/782 [01:48<08:11, 1.30it/s]
18%|ββ | 142/782 [01:49<08:10, 1.30it/s]
18%|ββ | 143/782 [01:50<08:13, 1.30it/s]
18%|ββ | 144/782 [01:50<08:15, 1.29it/s]
19%|ββ | 145/782 [01:51<08:04, 1.32it/s]
19%|ββ | 146/782 [01:52<07:56, 1.33it/s]
19%|ββ | 147/782 [01:53<07:57, 1.33it/s]
19%|ββ | 148/782 [01:53<07:59, 1.32it/s]
19%|ββ | 149/782 [01:54<08:00, 1.32it/s]
19%|ββ | 150/782 [01:55<08:00, 1.31it/s]
{'loss': 0.0137, 'grad_norm': 0.94921875, 'learning_rate': 1.6189258312020462e-05, 'epoch': 0.19} |
|
19%|ββ | 150/782 [01:55<08:00, 1.31it/s]
19%|ββ | 151/782 [01:56<08:08, 1.29it/s]
19%|ββ | 152/782 [01:56<08:03, 1.30it/s]
20%|ββ | 153/782 [01:57<08:06, 1.29it/s]
20%|ββ | 154/782 [01:58<08:04, 1.30it/s]
20%|ββ | 155/782 [01:59<08:03, 1.30it/s]
20%|ββ | 156/782 [02:00<08:03, 1.29it/s]
20%|ββ | 157/782 [02:00<08:04, 1.29it/s]
20%|ββ | 158/782 [02:01<07:57, 1.31it/s]
20%|ββ | 159/782 [02:02<07:56, 1.31it/s]
20%|ββ | 160/782 [02:03<07:52, 1.32it/s]
21%|ββ | 161/782 [02:03<07:47, 1.33it/s]
21%|ββ | 162/782 [02:04<07:43, 1.34it/s]
21%|ββ | 163/782 [02:05<07:40, 1.34it/s]
21%|ββ | 164/782 [02:06<07:47, 1.32it/s]
21%|ββ | 165/782 [02:06<07:47, 1.32it/s]
21%|ββ | 166/782 [02:07<07:47, 1.32it/s]
21%|βββ | 167/782 [02:08<07:44, 1.32it/s]
21%|βββ | 168/782 [02:09<07:37, 1.34it/s]
22%|βββ | 169/782 [02:09<07:30, 1.36it/s]
22%|βββ | 170/782 [02:10<07:28, 1.37it/s]
22%|βββ | 171/782 [02:11<07:37, 1.33it/s]
22%|βββ | 172/782 [02:12<07:40, 1.32it/s]
22%|βββ | 173/782 [02:12<07:43, 1.31it/s]
22%|βββ | 174/782 [02:13<07:39, 1.32it/s]
22%|βββ | 175/782 [02:14<07:43, 1.31it/s]
23%|βββ | 176/782 [02:15<07:42, 1.31it/s]
23%|βββ | 177/782 [02:15<07:44, 1.30it/s]
23%|βββ | 178/782 [02:16<07:39, 1.31it/s]
23%|βββ | 179/782 [02:17<07:37, 1.32it/s]
23%|βββ | 180/782 [02:18<07:32, 1.33it/s]
23%|βββ | 181/782 [02:18<07:29, 1.34it/s]
23%|βββ | 182/782 [02:19<07:26, 1.34it/s]
23%|βββ | 183/782 [02:20<07:30, 1.33it/s]
24%|βββ | 184/782 [02:21<07:31, 1.32it/s]
24%|βββ | 185/782 [02:21<07:40, 1.30it/s]
24%|βββ | 186/782 [02:22<07:40, 1.29it/s]
24%|βββ | 187/782 [02:23<07:36, 1.30it/s]
24%|βββ | 188/782 [02:24<07:36, 1.30it/s]
24%|βββ | 189/782 [02:25<07:36, 1.30it/s]
24%|βββ | 190/782 [02:25<07:37, 1.29it/s]
24%|βββ | 191/782 [02:26<07:39, 1.29it/s]
25%|βββ | 192/782 [02:27<07:28, 1.32it/s]
25%|βββ | 193/782 [02:28<07:25, 1.32it/s]
25%|βββ | 194/782 [02:28<07:28, 1.31it/s]
25%|βββ | 195/782 [02:29<07:27, 1.31it/s]
25%|βββ | 196/782 [02:30<07:27, 1.31it/s]
25%|βββ | 197/782 [02:31<07:30, 1.30it/s]
25%|βββ | 198/782 [02:31<07:31, 1.29it/s]
25%|βββ | 199/782 [02:32<07:28, 1.30it/s]
26%|βββ | 200/782 [02:33<07:27, 1.30it/s]
{'loss': 0.0047, 'grad_norm': 0.6484375, 'learning_rate': 1.4910485933503838e-05, 'epoch': 0.26} |
|
26%|βββ | 200/782 [02:33<07:27, 1.30it/s]
26%|βββ | 201/782 [02:34<07:23, 1.31it/s]
26%|βββ | 202/782 [02:35<07:21, 1.31it/s]
26%|βββ | 203/782 [02:35<07:19, 1.32it/s]
26%|βββ | 204/782 [02:36<07:23, 1.30it/s]
26%|βββ | 205/782 [02:37<07:24, 1.30it/s]
26%|βββ | 206/782 [02:38<07:25, 1.29it/s]
26%|βββ | 207/782 [02:38<07:21, 1.30it/s]
27%|βββ | 208/782 [02:39<07:20, 1.30it/s]
27%|βββ | 209/782 [02:40<07:12, 1.33it/s]
27%|βββ | 210/782 [02:41<07:10, 1.33it/s]
27%|βββ | 211/782 [02:41<07:12, 1.32it/s]
27%|βββ | 212/782 [02:42<07:06, 1.34it/s]
27%|βββ | 213/782 [02:43<07:09, 1.32it/s]
27%|βββ | 214/782 [02:44<07:10, 1.32it/s]
27%|βββ | 215/782 [02:44<07:08, 1.32it/s]
28%|βββ | 216/782 [02:45<07:08, 1.32it/s]
28%|βββ | 217/782 [02:46<07:04, 1.33it/s]
28%|βββ | 218/782 [02:47<07:10, 1.31it/s]
28%|βββ | 219/782 [02:47<07:14, 1.30it/s]
28%|βββ | 220/782 [02:48<07:12, 1.30it/s]
28%|βββ | 221/782 [02:49<07:14, 1.29it/s]
28%|βββ | 222/782 [02:50<07:09, 1.30it/s]
29%|βββ | 223/782 [02:51<07:04, 1.32it/s]
29%|βββ | 224/782 [02:51<07:07, 1.30it/s]
29%|βββ | 225/782 [02:52<07:05, 1.31it/s]
29%|βββ | 226/782 [02:53<07:00, 1.32it/s]
29%|βββ | 227/782 [02:54<07:02, 1.31it/s]
29%|βββ | 228/782 [02:54<07:02, 1.31it/s]
29%|βββ | 229/782 [02:55<07:03, 1.31it/s]
29%|βββ | 230/782 [02:56<06:53, 1.33it/s]
30%|βββ | 231/782 [02:57<06:57, 1.32it/s]
30%|βββ | 232/782 [02:57<06:56, 1.32it/s]
30%|βββ | 233/782 [02:58<07:03, 1.30it/s]
30%|βββ | 234/782 [02:59<06:59, 1.31it/s]
30%|βββ | 235/782 [03:00<07:00, 1.30it/s]
30%|βββ | 236/782 [03:00<07:02, 1.29it/s]
30%|βββ | 237/782 [03:01<07:02, 1.29it/s]
30%|βββ | 238/782 [03:02<07:01, 1.29it/s]
31%|βββ | 239/782 [03:03<06:55, 1.31it/s]
31%|βββ | 240/782 [03:04<06:57, 1.30it/s]
31%|βββ | 241/782 [03:04<07:02, 1.28it/s]
31%|βββ | 242/782 [03:05<06:58, 1.29it/s]
31%|βββ | 243/782 [03:06<06:50, 1.31it/s]
31%|βββ | 244/782 [03:07<06:52, 1.30it/s]
31%|ββββ | 245/782 [03:07<06:44, 1.33it/s]
31%|ββββ | 246/782 [03:08<06:44, 1.32it/s]
32%|ββββ | 247/782 [03:09<06:50, 1.30it/s]
32%|ββββ | 248/782 [03:10<06:49, 1.30it/s]
32%|ββββ | 249/782 [03:10<06:49, 1.30it/s]
32%|ββββ | 250/782 [03:11<06:45, 1.31it/s]
{'loss': 0.0028, 'grad_norm': 1.3984375, 'learning_rate': 1.3631713554987214e-05, 'epoch': 0.32} |
|
32%|ββββ | 250/782 [03:11<06:45, 1.31it/s]
32%|ββββ | 251/782 [03:12<06:43, 1.32it/s]
32%|ββββ | 252/782 [03:13<06:40, 1.32it/s]
32%|ββββ | 253/782 [03:13<06:42, 1.31it/s]
32%|ββββ | 254/782 [03:14<06:47, 1.30it/s]
33%|ββββ | 255/782 [03:15<06:38, 1.32it/s]
33%|ββββ | 256/782 [03:16<06:37, 1.32it/s]
33%|ββββ | 257/782 [03:16<06:38, 1.32it/s]
33%|ββββ | 258/782 [03:17<06:44, 1.30it/s]
33%|ββββ | 259/782 [03:18<06:41, 1.30it/s]
33%|ββββ | 260/782 [03:19<06:41, 1.30it/s]
33%|ββββ | 261/782 [03:20<06:38, 1.31it/s]
34%|ββββ | 262/782 [03:20<06:31, 1.33it/s]
34%|ββββ | 263/782 [03:21<06:26, 1.34it/s]
34%|ββββ | 264/782 [03:22<06:30, 1.33it/s]
34%|ββββ | 265/782 [03:23<06:26, 1.34it/s]
34%|ββββ | 266/782 [03:23<06:24, 1.34it/s]
34%|ββββ | 267/782 [03:24<06:20, 1.35it/s]
34%|ββββ | 268/782 [03:25<06:21, 1.35it/s]
34%|ββββ | 269/782 [03:26<06:26, 1.33it/s]
35%|ββββ | 270/782 [03:26<06:26, 1.33it/s]
35%|ββββ | 271/782 [03:27<06:22, 1.34it/s]
35%|ββββ | 272/782 [03:28<06:26, 1.32it/s]
35%|ββββ | 273/782 [03:29<06:26, 1.32it/s]
35%|ββββ | 274/782 [03:29<06:33, 1.29it/s]
35%|ββββ | 275/782 [03:30<06:23, 1.32it/s]
35%|ββββ | 276/782 [03:31<06:23, 1.32it/s]
35%|ββββ | 277/782 [03:32<06:23, 1.32it/s]
36%|ββββ | 278/782 [03:32<06:22, 1.32it/s]
36%|ββββ | 279/782 [03:33<06:22, 1.32it/s]
36%|ββββ | 280/782 [03:34<06:24, 1.31it/s]
36%|ββββ | 281/782 [03:35<06:18, 1.32it/s]
36%|ββββ | 282/782 [03:35<06:18, 1.32it/s]
36%|ββββ | 283/782 [03:36<06:21, 1.31it/s]
36%|ββββ | 284/782 [03:37<06:21, 1.30it/s]
36%|ββββ | 285/782 [03:38<06:19, 1.31it/s]
37%|ββββ | 286/782 [03:38<06:16, 1.32it/s]
37%|ββββ | 287/782 [03:39<06:15, 1.32it/s]
37%|ββββ | 288/782 [03:40<06:16, 1.31it/s]
37%|ββββ | 289/782 [03:41<06:15, 1.31it/s]
37%|ββββ | 290/782 [03:42<06:15, 1.31it/s]
37%|ββββ | 291/782 [03:42<06:13, 1.31it/s]
37%|ββββ | 292/782 [03:43<06:11, 1.32it/s]
37%|ββββ | 293/782 [03:44<06:10, 1.32it/s]
38%|ββββ | 294/782 [03:45<06:10, 1.32it/s]
38%|ββββ | 295/782 [03:45<06:09, 1.32it/s]
38%|ββββ | 296/782 [03:46<06:13, 1.30it/s]
38%|ββββ | 297/782 [03:47<06:10, 1.31it/s]
38%|ββββ | 298/782 [03:48<06:09, 1.31it/s]
38%|ββββ | 299/782 [03:48<06:02, 1.33it/s]
38%|ββββ | 300/782 [03:49<06:04, 1.32it/s]
{'loss': 0.0013, 'grad_norm': 1.171875, 'learning_rate': 1.235294117647059e-05, 'epoch': 0.38} |
|
38%|ββββ | 300/782 [03:49<06:04, 1.32it/s]
38%|ββββ | 301/782 [03:50<06:03, 1.32it/s]
39%|ββββ | 302/782 [03:51<06:04, 1.32it/s]
39%|ββββ | 303/782 [03:51<05:59, 1.33it/s]
39%|ββββ | 304/782 [03:52<06:07, 1.30it/s]
39%|ββββ | 305/782 [03:53<06:05, 1.30it/s]
39%|ββββ | 306/782 [03:54<06:04, 1.30it/s]
39%|ββββ | 307/782 [03:54<06:01, 1.32it/s]
39%|ββββ | 308/782 [03:55<05:58, 1.32it/s]
40%|ββββ | 309/782 [03:56<05:59, 1.32it/s]
40%|ββββ | 310/782 [03:57<05:54, 1.33it/s]
40%|ββββ | 311/782 [03:57<05:50, 1.34it/s]
40%|ββββ | 312/782 [03:58<05:53, 1.33it/s]
40%|ββββ | 313/782 [03:59<05:56, 1.31it/s]
40%|ββββ | 314/782 [04:00<05:54, 1.32it/s]
40%|ββββ | 315/782 [04:00<05:54, 1.32it/s]
40%|ββββ | 316/782 [04:01<05:53, 1.32it/s]
41%|ββββ | 317/782 [04:02<05:57, 1.30it/s]
41%|ββββ | 318/782 [04:03<05:56, 1.30it/s]
41%|ββββ | 319/782 [04:04<05:57, 1.29it/s]
41%|ββββ | 320/782 [04:04<05:56, 1.30it/s]
41%|ββββ | 321/782 [04:05<05:55, 1.30it/s]
41%|ββββ | 322/782 [04:06<05:50, 1.31it/s]
41%|βββββ | 323/782 [04:07<05:51, 1.30it/s]
41%|βββββ | 324/782 [04:07<05:49, 1.31it/s]
42%|βββββ | 325/782 [04:08<05:47, 1.31it/s]
42%|βββββ | 326/782 [04:09<05:43, 1.33it/s]
42%|βββββ | 327/782 [04:10<05:38, 1.34it/s]
42%|βββββ | 328/782 [04:10<05:41, 1.33it/s]
42%|βββββ | 329/782 [04:11<05:42, 1.32it/s]
42%|βββββ | 330/782 [04:12<05:42, 1.32it/s]
42%|βββββ | 331/782 [04:13<05:41, 1.32it/s]
42%|βββββ | 332/782 [04:13<05:43, 1.31it/s]
43%|βββββ | 333/782 [04:14<05:40, 1.32it/s]
43%|βββββ | 334/782 [04:15<05:43, 1.30it/s]
43%|βββββ | 335/782 [04:16<05:41, 1.31it/s]
43%|βββββ | 336/782 [04:17<05:42, 1.30it/s]
43%|βββββ | 337/782 [04:17<05:41, 1.30it/s]
43%|βββββ | 338/782 [04:18<05:38, 1.31it/s]
43%|βββββ | 339/782 [04:19<05:34, 1.32it/s]
43%|βββββ | 340/782 [04:20<05:33, 1.32it/s]
44%|βββββ | 341/782 [04:20<05:36, 1.31it/s]
44%|βββββ | 342/782 [04:21<05:31, 1.33it/s]
44%|βββββ | 343/782 [04:22<05:34, 1.31it/s]
44%|βββββ | 344/782 [04:23<05:33, 1.31it/s]
44%|βββββ | 345/782 [04:23<05:32, 1.31it/s]
44%|βββββ | 346/782 [04:24<05:31, 1.31it/s]
44%|βββββ | 347/782 [04:25<05:27, 1.33it/s]
45%|βββββ | 348/782 [04:26<05:23, 1.34it/s]
45%|βββββ | 349/782 [04:26<05:25, 1.33it/s]
45%|βββββ | 350/782 [04:27<05:23, 1.34it/s]
{'loss': 0.0007, 'grad_norm': 0.1298828125, 'learning_rate': 1.1074168797953967e-05, 'epoch': 0.45} |
|
45%|βββββ | 350/782 [04:27<05:23, 1.34it/s]
45%|βββββ | 351/782 [04:28<05:30, 1.31it/s]
45%|βββββ | 352/782 [04:29<05:29, 1.31it/s]
45%|βββββ | 353/782 [04:29<05:28, 1.31it/s]
45%|βββββ | 354/782 [04:30<05:28, 1.30it/s]
45%|βββββ | 355/782 [04:31<05:22, 1.32it/s]
46%|βββββ | 356/782 [04:32<05:21, 1.33it/s]
46%|βββββ | 357/782 [04:32<05:20, 1.32it/s]
46%|βββββ | 358/782 [04:33<05:17, 1.34it/s]
46%|βββββ | 359/782 [04:34<05:17, 1.33it/s]
46%|βββββ | 360/782 [04:35<05:16, 1.33it/s]
46%|βββββ | 361/782 [04:35<05:15, 1.33it/s]
46%|βββββ | 362/782 [04:36<05:14, 1.34it/s]
46%|βββββ | 363/782 [04:37<05:12, 1.34it/s]
47%|βββββ | 364/782 [04:38<05:09, 1.35it/s]
47%|βββββ | 365/782 [04:38<05:14, 1.32it/s]
47%|βββββ | 366/782 [04:39<05:14, 1.32it/s]
47%|βββββ | 367/782 [04:40<05:15, 1.31it/s]
47%|βββββ | 368/782 [04:41<05:15, 1.31it/s]
47%|βββββ | 369/782 [04:41<05:17, 1.30it/s]
47%|βββββ | 370/782 [04:42<05:13, 1.31it/s]
47%|βββββ | 371/782 [04:43<05:13, 1.31it/s]
48%|βββββ | 372/782 [04:44<05:14, 1.31it/s]
48%|βββββ | 373/782 [04:45<05:13, 1.30it/s]
48%|βββββ | 374/782 [04:45<05:13, 1.30it/s]
48%|βββββ | 375/782 [04:46<05:13, 1.30it/s]
48%|βββββ | 376/782 [04:47<05:11, 1.30it/s]
48%|βββββ | 377/782 [04:48<05:11, 1.30it/s]
48%|βββββ | 378/782 [04:48<05:07, 1.31it/s]
48%|βββββ | 379/782 [04:49<05:09, 1.30it/s]
49%|βββββ | 380/782 [04:50<05:03, 1.32it/s]
49%|βββββ | 381/782 [04:51<05:05, 1.31it/s]
49%|βββββ | 382/782 [04:51<05:06, 1.30it/s]
49%|βββββ | 383/782 [04:52<05:03, 1.32it/s]
49%|βββββ | 384/782 [04:53<05:04, 1.31it/s]
49%|βββββ | 385/782 [04:54<05:04, 1.30it/s]
49%|βββββ | 386/782 [04:55<05:07, 1.29it/s]
49%|βββββ | 387/782 [04:55<05:08, 1.28it/s]
50%|βββββ | 388/782 [04:56<05:04, 1.29it/s]
50%|βββββ | 389/782 [04:57<05:03, 1.30it/s]
50%|βββββ | 390/782 [04:58<05:00, 1.30it/s]
50%|βββββ | 391/782 [04:58<04:58, 1.31it/s]
50%|βββββ | 392/782 [04:59<04:59, 1.30it/s]
50%|βββββ | 393/782 [05:00<05:00, 1.29it/s]
50%|βββββ | 394/782 [05:01<04:57, 1.30it/s]
51%|βββββ | 395/782 [05:01<04:57, 1.30it/s]
51%|βββββ | 396/782 [05:02<04:53, 1.32it/s]
51%|βββββ | 397/782 [05:03<04:55, 1.30it/s]
51%|βββββ | 398/782 [05:04<04:55, 1.30it/s]
51%|βββββ | 399/782 [05:04<04:55, 1.30it/s]
51%|βββββ | 400/782 [05:05<04:57, 1.29it/s]
{'loss': 0.0003, 'grad_norm': 0.037841796875, 'learning_rate': 9.795396419437341e-06, 'epoch': 0.51} |
|
51%|βββββ | 400/782 [05:05<04:57, 1.29it/s]
51%|ββββββ | 401/782 [05:06<04:52, 1.30it/s]
51%|ββββββ | 402/782 [05:07<04:50, 1.31it/s]
52%|ββββββ | 403/782 [05:08<04:52, 1.29it/s]
52%|ββββββ | 404/782 [05:08<04:51, 1.30it/s]
52%|ββββββ | 405/782 [05:09<04:51, 1.29it/s]
52%|ββββββ | 406/782 [05:10<04:49, 1.30it/s]
52%|ββββββ | 407/782 [05:11<04:48, 1.30it/s]
52%|ββββββ | 408/782 [05:11<04:47, 1.30it/s]
52%|ββββββ | 409/782 [05:12<04:35, 1.35it/s]
52%|ββββββ | 410/782 [05:13<04:37, 1.34it/s]
53%|ββββββ | 411/782 [05:14<04:41, 1.32it/s]
53%|ββββββ | 412/782 [05:14<04:40, 1.32it/s]
53%|ββββββ | 413/782 [05:15<04:38, 1.33it/s]
53%|ββββββ | 414/782 [05:16<04:39, 1.32it/s]
53%|ββββββ | 415/782 [05:17<04:37, 1.32it/s]
53%|ββββββ | 416/782 [05:17<04:38, 1.31it/s]
53%|ββββββ | 417/782 [05:18<04:37, 1.31it/s]
53%|ββββββ | 418/782 [05:19<04:37, 1.31it/s]
54%|ββββββ | 419/782 [05:20<04:39, 1.30it/s]
54%|ββββββ | 420/782 [05:21<04:39, 1.30it/s]
54%|ββββββ | 421/782 [05:21<04:35, 1.31it/s]
54%|ββββββ | 422/782 [05:22<04:40, 1.29it/s]
54%|ββββββ | 423/782 [05:23<04:35, 1.30it/s]
54%|ββββββ | 424/782 [05:24<04:37, 1.29it/s]
54%|ββββββ | 425/782 [05:24<04:36, 1.29it/s]
54%|ββββββ | 426/782 [05:25<04:36, 1.29it/s]
55%|ββββββ | 427/782 [05:26<04:34, 1.29it/s]
55%|ββββββ | 428/782 [05:27<04:31, 1.30it/s]
55%|ββββββ | 429/782 [05:27<04:29, 1.31it/s]
55%|ββββββ | 430/782 [05:28<04:25, 1.32it/s]
55%|ββββββ | 431/782 [05:29<04:24, 1.33it/s]
55%|ββββββ | 432/782 [05:30<04:23, 1.33it/s]
55%|ββββββ | 433/782 [05:30<04:24, 1.32it/s]
55%|ββββββ | 434/782 [05:31<04:27, 1.30it/s]
56%|ββββββ | 435/782 [05:32<04:26, 1.30it/s]
56%|ββββββ | 436/782 [05:33<04:22, 1.32it/s]
56%|ββββββ | 437/782 [05:34<04:20, 1.32it/s]
56%|ββββββ | 438/782 [05:34<04:17, 1.34it/s]
56%|ββββββ | 439/782 [05:35<04:17, 1.33it/s]
56%|ββββββ | 440/782 [05:36<04:15, 1.34it/s]
56%|ββββββ | 441/782 [05:37<04:18, 1.32it/s]
57%|ββββββ | 442/782 [05:37<04:20, 1.31it/s]
57%|ββββββ | 443/782 [05:38<04:18, 1.31it/s]
57%|ββββββ | 444/782 [05:39<04:18, 1.31it/s]
57%|ββββββ | 445/782 [05:40<04:20, 1.29it/s]
57%|ββββββ | 446/782 [05:40<04:16, 1.31it/s]
57%|ββββββ | 447/782 [05:41<04:17, 1.30it/s]
57%|ββββββ | 448/782 [05:42<04:16, 1.30it/s]
57%|ββββββ | 449/782 [05:43<04:11, 1.32it/s]
58%|ββββββ | 450/782 [05:43<04:11, 1.32it/s]
{'loss': 0.0002, 'grad_norm': 0.0400390625, 'learning_rate': 8.516624040920717e-06, 'epoch': 0.58} |
|
58%|ββββββ | 450/782 [05:43<04:11, 1.32it/s]
58%|ββββββ | 451/782 [05:44<04:12, 1.31it/s]
58%|ββββββ | 452/782 [05:45<04:11, 1.31it/s]
58%|ββββββ | 453/782 [05:46<04:08, 1.32it/s]
58%|ββββββ | 454/782 [05:46<04:07, 1.33it/s]
58%|ββββββ | 455/782 [05:47<04:09, 1.31it/s]
58%|ββββββ | 456/782 [05:48<04:08, 1.31it/s]
58%|ββββββ | 457/782 [05:49<04:08, 1.31it/s]
59%|ββββββ | 458/782 [05:49<04:02, 1.33it/s]
59%|ββββββ | 459/782 [05:50<04:03, 1.33it/s]
59%|ββββββ | 460/782 [05:51<04:05, 1.31it/s]
59%|ββββββ | 461/782 [05:52<03:59, 1.34it/s]
59%|ββββββ | 462/782 [05:52<04:02, 1.32it/s]
59%|ββββββ | 463/782 [05:53<03:59, 1.33it/s]
59%|ββββββ | 464/782 [05:54<04:00, 1.32it/s]
59%|ββββββ | 465/782 [05:55<04:02, 1.31it/s]
60%|ββββββ | 466/782 [05:56<04:00, 1.32it/s]
60%|ββββββ | 467/782 [05:56<04:01, 1.31it/s]
60%|ββββββ | 468/782 [05:57<03:57, 1.32it/s]
60%|ββββββ | 469/782 [05:58<03:58, 1.31it/s]
60%|ββββββ | 470/782 [05:59<03:57, 1.31it/s]
60%|ββββββ | 471/782 [05:59<03:54, 1.33it/s]
60%|ββββββ | 472/782 [06:00<03:52, 1.33it/s]
60%|ββββββ | 473/782 [06:01<03:54, 1.32it/s]
61%|ββββββ | 474/782 [06:02<03:53, 1.32it/s]
61%|ββββββ | 475/782 [06:02<03:52, 1.32it/s]
61%|ββββββ | 476/782 [06:03<03:48, 1.34it/s]
61%|ββββββ | 477/782 [06:04<03:52, 1.31it/s]
61%|ββββββ | 478/782 [06:05<03:54, 1.29it/s]
61%|βββββββ | 479/782 [06:05<03:54, 1.29it/s]
61%|βββββββ | 480/782 [06:06<03:55, 1.28it/s]
62%|βββββββ | 481/782 [06:07<03:52, 1.30it/s]
62%|βββββββ | 482/782 [06:08<03:52, 1.29it/s]
62%|βββββββ | 483/782 [06:09<03:49, 1.30it/s]
62%|βββββββ | 484/782 [06:09<03:49, 1.30it/s]
62%|βββββββ | 485/782 [06:10<03:48, 1.30it/s]
62%|βββββββ | 486/782 [06:11<03:48, 1.30it/s]
62%|βββββββ | 487/782 [06:12<03:48, 1.29it/s]
62%|βββββββ | 488/782 [06:12<03:43, 1.32it/s]
63%|βββββββ | 489/782 [06:13<03:43, 1.31it/s]
63%|βββββββ | 490/782 [06:14<03:45, 1.29it/s]
63%|βββββββ | 491/782 [06:15<03:42, 1.31it/s]
63%|βββββββ | 492/782 [06:15<03:41, 1.31it/s]
63%|βββββββ | 493/782 [06:16<03:41, 1.30it/s]
63%|βββββββ | 494/782 [06:17<03:40, 1.30it/s]
63%|βββββββ | 495/782 [06:18<03:40, 1.30it/s]
63%|βββββββ | 496/782 [06:19<03:42, 1.29it/s]
64%|βββββββ | 497/782 [06:19<03:38, 1.30it/s]
64%|βββββββ | 498/782 [06:20<03:39, 1.30it/s]
64%|βββββββ | 499/782 [06:21<03:39, 1.29it/s]
64%|βββββββ | 500/782 [06:22<03:38, 1.29it/s]
{'loss': 0.0002, 'grad_norm': 0.050537109375, 'learning_rate': 7.237851662404093e-06, 'epoch': 0.64} |
|
64%|βββββββ | 500/782 [06:22<03:38, 1.29it/s]
64%|βββββββ | 501/782 [06:22<03:35, 1.31it/s]
64%|βββββββ | 502/782 [06:23<03:33, 1.31it/s]
64%|βββββββ | 503/782 [06:24<03:32, 1.31it/s]
64%|βββββββ | 504/782 [06:25<03:30, 1.32it/s]
65%|βββββββ | 505/782 [06:25<03:31, 1.31it/s]
65%|βββββββ | 506/782 [06:26<03:30, 1.31it/s]
65%|βββββββ | 507/782 [06:27<03:30, 1.31it/s]
65%|βββββββ | 508/782 [06:28<03:28, 1.31it/s]
65%|βββββββ | 509/782 [06:28<03:27, 1.32it/s]
65%|βββββββ | 510/782 [06:29<03:25, 1.32it/s]
65%|βββββββ | 511/782 [06:30<03:26, 1.31it/s]
65%|βββββββ | 512/782 [06:31<03:25, 1.31it/s]
66%|βββββββ | 513/782 [06:31<03:24, 1.31it/s]
66%|βββββββ | 514/782 [06:32<03:23, 1.31it/s]
66%|βββββββ | 515/782 [06:33<03:22, 1.32it/s]
66%|βββββββ | 516/782 [06:34<03:22, 1.31it/s]
66%|βββββββ | 517/782 [06:35<03:21, 1.32it/s]
66%|βββββββ | 518/782 [06:35<03:21, 1.31it/s]
66%|βββββββ | 519/782 [06:36<03:20, 1.31it/s]
66%|βββββββ | 520/782 [06:37<03:19, 1.31it/s]
67%|βββββββ | 521/782 [06:38<03:21, 1.30it/s]
67%|βββββββ | 522/782 [06:38<03:19, 1.31it/s]
67%|βββββββ | 523/782 [06:39<03:17, 1.31it/s]
67%|βββββββ | 524/782 [06:40<03:16, 1.31it/s]
67%|βββββββ | 525/782 [06:41<03:12, 1.34it/s]
67%|βββββββ | 526/782 [06:41<03:10, 1.34it/s]
67%|βββββββ | 527/782 [06:42<03:12, 1.33it/s]
68%|βββββββ | 528/782 [06:43<03:13, 1.31it/s]
68%|βββββββ | 529/782 [06:44<03:12, 1.31it/s]
68%|βββββββ | 530/782 [06:44<03:08, 1.34it/s]
68%|βββββββ | 531/782 [06:45<03:09, 1.33it/s]
68%|βββββββ | 532/782 [06:46<03:08, 1.33it/s]
68%|βββββββ | 533/782 [06:47<03:08, 1.32it/s]
68%|βββββββ | 534/782 [06:47<03:09, 1.31it/s]
68%|βββββββ | 535/782 [06:48<03:09, 1.30it/s]
69%|βββββββ | 536/782 [06:49<03:11, 1.29it/s]
69%|βββββββ | 537/782 [06:50<03:07, 1.31it/s]
69%|βββββββ | 538/782 [06:51<03:07, 1.30it/s]
69%|βββββββ | 539/782 [06:51<03:06, 1.30it/s]
69%|βββββββ | 540/782 [06:52<03:05, 1.30it/s]
69%|βββββββ | 541/782 [06:53<03:05, 1.30it/s]
69%|βββββββ | 542/782 [06:54<03:05, 1.29it/s]
69%|βββββββ | 543/782 [06:54<03:04, 1.30it/s]
70%|βββββββ | 544/782 [06:55<03:01, 1.31it/s]
70%|βββββββ | 545/782 [06:56<02:59, 1.32it/s]
70%|βββββββ | 546/782 [06:57<03:02, 1.29it/s]
70%|βββββββ | 547/782 [06:57<03:03, 1.28it/s]
70%|βββββββ | 548/782 [06:58<03:03, 1.28it/s]
70%|βββββββ | 549/782 [06:59<03:02, 1.28it/s]
70%|βββββββ | 550/782 [07:00<02:57, 1.31it/s]
{'loss': 0.0002, 'grad_norm': 0.05517578125, 'learning_rate': 5.959079283887469e-06, 'epoch': 0.7} |
|
70%|βββββββ | 550/782 [07:00<02:57, 1.31it/s]
70%|βββββββ | 551/782 [07:01<02:56, 1.31it/s]
71%|βββββββ | 552/782 [07:01<02:54, 1.32it/s]
71%|βββββββ | 553/782 [07:02<02:54, 1.31it/s]
71%|βββββββ | 554/782 [07:03<02:55, 1.30it/s]
71%|βββββββ | 555/782 [07:04<02:54, 1.30it/s]
71%|βββββββ | 556/782 [07:04<02:53, 1.31it/s]
71%|βββββββ | 557/782 [07:05<02:51, 1.31it/s]
71%|ββββββββ | 558/782 [07:06<02:52, 1.30it/s]
71%|ββββββββ | 559/782 [07:07<02:47, 1.33it/s]
72%|ββββββββ | 560/782 [07:07<02:47, 1.32it/s]
72%|ββββββββ | 561/782 [07:08<02:47, 1.32it/s]
72%|ββββββββ | 562/782 [07:09<02:46, 1.32it/s]
72%|ββββββββ | 563/782 [07:10<02:45, 1.32it/s]
72%|ββββββββ | 564/782 [07:10<02:44, 1.32it/s]
72%|ββββββββ | 565/782 [07:11<02:44, 1.32it/s]
72%|ββββββββ | 566/782 [07:12<02:43, 1.32it/s]
73%|ββββββββ | 567/782 [07:13<02:44, 1.31it/s]
73%|ββββββββ | 568/782 [07:13<02:44, 1.30it/s]
73%|ββββββββ | 569/782 [07:14<02:42, 1.31it/s]
73%|ββββββββ | 570/782 [07:15<02:39, 1.33it/s]
73%|ββββββββ | 571/782 [07:16<02:38, 1.33it/s]
73%|ββββββββ | 572/782 [07:16<02:40, 1.31it/s]
73%|ββββββββ | 573/782 [07:17<02:39, 1.31it/s]
73%|ββββββββ | 574/782 [07:18<02:36, 1.33it/s]
74%|ββββββββ | 575/782 [07:19<02:33, 1.35it/s]
74%|ββββββββ | 576/782 [07:19<02:34, 1.33it/s]
74%|ββββββββ | 577/782 [07:20<02:35, 1.32it/s]
74%|ββββββββ | 578/782 [07:21<02:31, 1.34it/s]
74%|ββββββββ | 579/782 [07:22<02:32, 1.33it/s]
74%|ββββββββ | 580/782 [07:23<02:35, 1.30it/s]
74%|ββββββββ | 581/782 [07:23<02:35, 1.29it/s]
74%|ββββββββ | 582/782 [07:24<02:32, 1.31it/s]
75%|ββββββββ | 583/782 [07:25<02:32, 1.31it/s]
75%|ββββββββ | 584/782 [07:26<02:30, 1.31it/s]
75%|ββββββββ | 585/782 [07:26<02:32, 1.29it/s]
75%|ββββββββ | 586/782 [07:27<02:31, 1.29it/s]
75%|ββββββββ | 587/782 [07:28<02:28, 1.31it/s]
75%|ββββββββ | 588/782 [07:29<02:29, 1.30it/s]
75%|ββββββββ | 589/782 [07:29<02:29, 1.29it/s]
75%|ββββββββ | 590/782 [07:30<02:25, 1.32it/s]
76%|ββββββββ | 591/782 [07:31<02:23, 1.33it/s]
76%|ββββββββ | 592/782 [07:32<02:23, 1.33it/s]
76%|ββββββββ | 593/782 [07:32<02:24, 1.31it/s]
76%|ββββββββ | 594/782 [07:33<02:23, 1.31it/s]
76%|ββββββββ | 595/782 [07:34<02:21, 1.33it/s]
76%|ββββββββ | 596/782 [07:35<02:21, 1.32it/s]
76%|ββββββββ | 597/782 [07:35<02:20, 1.32it/s]
76%|ββββββββ | 598/782 [07:36<02:20, 1.31it/s]
77%|ββββββββ | 599/782 [07:37<02:21, 1.29it/s]
77%|ββββββββ | 600/782 [07:38<02:17, 1.32it/s]
{'loss': 0.0002, 'grad_norm': 0.0390625, 'learning_rate': 4.6803069053708444e-06, 'epoch': 0.77} |
|
77%|ββββββββ | 600/782 [07:38<02:17, 1.32it/s]
77%|ββββββββ | 601/782 [07:39<02:16, 1.32it/s]
77%|ββββββββ | 602/782 [07:39<02:18, 1.30it/s]
77%|ββββββββ | 603/782 [07:40<02:18, 1.29it/s]
77%|ββββββββ | 604/782 [07:41<02:17, 1.29it/s]
77%|ββββββββ | 605/782 [07:42<02:14, 1.32it/s]
77%|ββββββββ | 606/782 [07:42<02:14, 1.31it/s]
78%|ββββββββ | 607/782 [07:43<02:13, 1.31it/s]
78%|ββββββββ | 608/782 [07:44<02:11, 1.32it/s]
78%|ββββββββ | 609/782 [07:45<02:12, 1.30it/s]
78%|ββββββββ | 610/782 [07:45<02:12, 1.29it/s]
78%|ββββββββ | 611/782 [07:46<02:11, 1.30it/s]
78%|ββββββββ | 612/782 [07:47<02:11, 1.30it/s]
78%|ββββββββ | 613/782 [07:48<02:10, 1.30it/s]
79%|ββββββββ | 614/782 [07:49<02:09, 1.30it/s]
79%|ββββββββ | 615/782 [07:49<02:06, 1.32it/s]
79%|ββββββββ | 616/782 [07:50<02:06, 1.31it/s]
79%|ββββββββ | 617/782 [07:51<02:06, 1.31it/s]
79%|ββββββββ | 618/782 [07:52<02:04, 1.31it/s]
79%|ββββββββ | 619/782 [07:52<02:03, 1.32it/s]
79%|ββββββββ | 620/782 [07:53<02:05, 1.29it/s]
79%|ββββββββ | 621/782 [07:54<02:03, 1.30it/s]
80%|ββββββββ | 622/782 [07:55<02:04, 1.29it/s]
80%|ββββββββ | 623/782 [07:55<02:04, 1.28it/s]
80%|ββββββββ | 624/782 [07:56<02:02, 1.29it/s]
80%|ββββββββ | 625/782 [07:57<02:01, 1.29it/s]
80%|ββββββββ | 626/782 [07:58<01:59, 1.30it/s]
80%|ββββββββ | 627/782 [07:58<01:58, 1.31it/s]
80%|ββββββββ | 628/782 [07:59<01:57, 1.31it/s]
80%|ββββββββ | 629/782 [08:00<01:55, 1.32it/s]
81%|ββββββββ | 630/782 [08:01<01:56, 1.31it/s]
81%|ββββββββ | 631/782 [08:02<01:55, 1.30it/s]
81%|ββββββββ | 632/782 [08:02<01:55, 1.30it/s]
81%|ββββββββ | 633/782 [08:03<01:54, 1.30it/s]
81%|ββββββββ | 634/782 [08:04<01:53, 1.30it/s]
81%|ββββββββ | 635/782 [08:05<01:51, 1.32it/s]
81%|βββββββββ | 636/782 [08:05<01:49, 1.33it/s]
81%|βββββββββ | 637/782 [08:06<01:48, 1.34it/s]
82%|βββββββββ | 638/782 [08:07<01:48, 1.33it/s]
82%|βββββββββ | 639/782 [08:08<01:46, 1.35it/s]
82%|βββββββββ | 640/782 [08:08<01:47, 1.32it/s]
82%|βββββββββ | 641/782 [08:09<01:47, 1.31it/s]
82%|βββββββββ | 642/782 [08:10<01:47, 1.31it/s]
82%|βββββββββ | 643/782 [08:11<01:48, 1.29it/s]
82%|βββββββββ | 644/782 [08:11<01:46, 1.30it/s]
82%|βββββββββ | 645/782 [08:12<01:45, 1.30it/s]
83%|βββββββββ | 646/782 [08:13<01:44, 1.30it/s]
83%|βββββββββ | 647/782 [08:14<01:44, 1.30it/s]
83%|βββββββββ | 648/782 [08:15<01:43, 1.30it/s]
83%|βββββββββ | 649/782 [08:15<01:40, 1.33it/s]
83%|βββββββββ | 650/782 [08:16<01:41, 1.30it/s]
{'loss': 0.0002, 'grad_norm': 0.035888671875, 'learning_rate': 3.4015345268542205e-06, 'epoch': 0.83} |
|
83%|βββββββββ | 650/782 [08:16<01:41, 1.30it/s]
83%|βββββββββ | 651/782 [08:17<01:40, 1.30it/s]
83%|βββββββββ | 652/782 [08:18<01:39, 1.31it/s]
84%|βββββββββ | 653/782 [08:18<01:38, 1.31it/s]
84%|βββββββββ | 654/782 [08:19<01:35, 1.34it/s]
84%|βββββββββ | 655/782 [08:20<01:35, 1.33it/s]
84%|βββββββββ | 656/782 [08:21<01:33, 1.34it/s]
84%|βββββββββ | 657/782 [08:21<01:34, 1.33it/s]
84%|βββββββββ | 658/782 [08:22<01:32, 1.35it/s]
84%|βββββββββ | 659/782 [08:23<01:32, 1.33it/s]
84%|βββββββββ | 660/782 [08:24<01:32, 1.33it/s]
85%|βββββββββ | 661/782 [08:24<01:31, 1.32it/s]
85%|βββββββββ | 662/782 [08:25<01:31, 1.31it/s]
85%|βββββββββ | 663/782 [08:26<01:31, 1.31it/s]
85%|βββββββββ | 664/782 [08:27<01:30, 1.31it/s]
85%|βββββββββ | 665/782 [08:27<01:30, 1.29it/s]
85%|βββββββββ | 666/782 [08:28<01:29, 1.30it/s]
85%|βββββββββ | 667/782 [08:29<01:27, 1.31it/s]
85%|βββββββββ | 668/782 [08:30<01:25, 1.34it/s]
86%|βββββββββ | 669/782 [08:30<01:24, 1.33it/s]
86%|βββββββββ | 670/782 [08:31<01:24, 1.33it/s]
86%|βββββββββ | 671/782 [08:32<01:22, 1.34it/s]
86%|βββββββββ | 672/782 [08:33<01:23, 1.32it/s]
86%|βββββββββ | 673/782 [08:33<01:22, 1.32it/s]
86%|βββββββββ | 674/782 [08:34<01:22, 1.32it/s]
86%|βββββββββ | 675/782 [08:35<01:20, 1.33it/s]
86%|βββββββββ | 676/782 [08:36<01:19, 1.33it/s]
87%|βββββββββ | 677/782 [08:36<01:20, 1.31it/s]
87%|βββββββββ | 678/782 [08:37<01:20, 1.29it/s]
87%|βββββββββ | 679/782 [08:38<01:20, 1.29it/s]
87%|βββββββββ | 680/782 [08:39<01:19, 1.28it/s]
87%|βββββββββ | 681/782 [08:40<01:17, 1.31it/s]
87%|βββββββββ | 682/782 [08:40<01:16, 1.31it/s]
87%|βββββββββ | 683/782 [08:41<01:15, 1.31it/s]
87%|βββββββββ | 684/782 [08:42<01:15, 1.29it/s]
88%|βββββββββ | 685/782 [08:43<01:14, 1.30it/s]
88%|βββββββββ | 686/782 [08:43<01:15, 1.28it/s]
88%|βββββββββ | 687/782 [08:44<01:13, 1.29it/s]
88%|βββββββββ | 688/782 [08:45<01:12, 1.29it/s]
88%|βββββββββ | 689/782 [08:46<01:10, 1.31it/s]
88%|βββββββββ | 690/782 [08:46<01:10, 1.31it/s]
88%|βββββββββ | 691/782 [08:47<01:10, 1.30it/s]
88%|βββββββββ | 692/782 [08:48<01:09, 1.30it/s]
89%|βββββββββ | 693/782 [08:49<01:08, 1.30it/s]
89%|βββββββββ | 694/782 [08:50<01:17, 1.13it/s]
89%|βββββββββ | 695/782 [08:51<01:13, 1.19it/s]
89%|βββββββββ | 696/782 [08:51<01:10, 1.22it/s]
89%|βββββββββ | 697/782 [08:52<01:08, 1.25it/s]
89%|βββββββββ | 698/782 [08:53<01:07, 1.25it/s]
89%|βββββββββ | 699/782 [08:54<01:04, 1.28it/s]
90%|βββββββββ | 700/782 [08:55<01:04, 1.28it/s]
{'loss': 0.0002, 'grad_norm': 0.055908203125, 'learning_rate': 2.122762148337596e-06, 'epoch': 0.9} |
|
90%|βββββββββ | 700/782 [08:55<01:04, 1.28it/s]
90%|βββββββββ | 701/782 [08:55<01:02, 1.29it/s]
90%|βββββββββ | 702/782 [08:56<01:00, 1.31it/s]
90%|βββββββββ | 703/782 [08:57<01:00, 1.31it/s]
90%|βββββββββ | 704/782 [08:58<00:59, 1.31it/s]
90%|βββββββββ | 705/782 [08:58<00:59, 1.29it/s]
90%|βββββββββ | 706/782 [08:59<00:58, 1.30it/s]
90%|βββββββββ | 707/782 [09:00<00:56, 1.32it/s]
91%|βββββββββ | 708/782 [09:01<00:56, 1.30it/s]
91%|βββββββββ | 709/782 [09:01<00:55, 1.31it/s]
91%|βββββββββ | 710/782 [09:02<00:55, 1.30it/s]
91%|βββββββββ | 711/782 [09:03<00:54, 1.29it/s]
91%|βββββββββ | 712/782 [09:04<00:54, 1.29it/s]
91%|βββββββββ | 713/782 [09:05<00:53, 1.30it/s]
91%|ββββββββββ| 714/782 [09:05<00:51, 1.32it/s]
91%|ββββββββββ| 715/782 [09:06<00:51, 1.31it/s]
92%|ββββββββββ| 716/782 [09:07<00:49, 1.33it/s]
92%|ββββββββββ| 717/782 [09:08<00:49, 1.32it/s]
92%|ββββββββββ| 718/782 [09:08<00:48, 1.32it/s]
92%|ββββββββββ| 719/782 [09:09<00:48, 1.31it/s]
92%|ββββββββββ| 720/782 [09:10<00:47, 1.32it/s]
92%|ββββββββββ| 721/782 [09:11<00:46, 1.32it/s]
92%|ββββββββββ| 722/782 [09:11<00:45, 1.33it/s]
92%|ββββββββββ| 723/782 [09:12<00:43, 1.34it/s]
93%|ββββββββββ| 724/782 [09:13<00:44, 1.32it/s]
93%|ββββββββββ| 725/782 [09:14<00:43, 1.30it/s]
93%|ββββββββββ| 726/782 [09:14<00:42, 1.31it/s]
93%|ββββββββββ| 727/782 [09:15<00:42, 1.30it/s]
93%|ββββββββββ| 728/782 [09:16<00:41, 1.29it/s]
93%|ββββββββββ| 729/782 [09:17<00:41, 1.28it/s]
93%|ββββββββββ| 730/782 [09:17<00:39, 1.31it/s]
93%|ββββββββββ| 731/782 [09:18<00:38, 1.31it/s]
94%|ββββββββββ| 732/782 [09:19<00:37, 1.34it/s]
94%|ββββββββββ| 733/782 [09:20<00:36, 1.33it/s]
94%|ββββββββββ| 734/782 [09:20<00:36, 1.33it/s]
94%|ββββββββββ| 735/782 [09:21<00:35, 1.34it/s]
94%|ββββββββββ| 736/782 [09:22<00:34, 1.33it/s]
94%|ββββββββββ| 737/782 [09:23<00:33, 1.34it/s]
94%|ββββββββββ| 738/782 [09:23<00:33, 1.33it/s]
95%|ββββββββββ| 739/782 [09:24<00:31, 1.35it/s]
95%|ββββββββββ| 740/782 [09:25<00:31, 1.35it/s]
95%|ββββββββββ| 741/782 [09:26<00:30, 1.35it/s]
95%|ββββββββββ| 742/782 [09:26<00:29, 1.36it/s]
95%|ββββββββββ| 743/782 [09:27<00:29, 1.33it/s]
95%|ββββββββββ| 744/782 [09:28<00:28, 1.33it/s]
95%|ββββββββββ| 745/782 [09:29<00:28, 1.31it/s]
95%|ββββββββββ| 746/782 [09:29<00:27, 1.30it/s]
96%|ββββββββββ| 747/782 [09:30<00:26, 1.30it/s]
96%|ββββββββββ| 748/782 [09:31<00:25, 1.31it/s]
96%|ββββββββββ| 749/782 [09:32<00:25, 1.32it/s]
96%|ββββββββββ| 750/782 [09:32<00:24, 1.31it/s]
{'loss': 0.0002, 'grad_norm': 0.04638671875, 'learning_rate': 8.439897698209719e-07, 'epoch': 0.96} |
|
96%|ββββββββββ| 750/782 [09:33<00:24, 1.31it/s]
96%|ββββββββββ| 751/782 [09:33<00:23, 1.31it/s]
96%|ββββββββββ| 752/782 [09:34<00:22, 1.31it/s]
96%|ββββββββββ| 753/782 [09:35<00:21, 1.32it/s]
96%|ββββββββββ| 754/782 [09:36<00:21, 1.31it/s]
97%|ββββββββββ| 755/782 [09:36<00:20, 1.32it/s]
97%|ββββββββββ| 756/782 [09:37<00:19, 1.31it/s]
97%|ββββββββββ| 757/782 [09:38<00:19, 1.31it/s]
97%|ββββββββββ| 758/782 [09:39<00:18, 1.31it/s]
97%|ββββββββββ| 759/782 [09:39<00:17, 1.31it/s]
97%|ββββββββββ| 760/782 [09:40<00:16, 1.31it/s]
97%|ββββββββββ| 761/782 [09:41<00:15, 1.32it/s]
97%|ββββββββββ| 762/782 [09:42<00:15, 1.30it/s]
98%|ββββββββββ| 763/782 [09:42<00:14, 1.31it/s]
98%|ββββββββββ| 764/782 [09:43<00:13, 1.30it/s]
98%|ββββββββββ| 765/782 [09:44<00:13, 1.30it/s]
98%|ββββββββββ| 766/782 [09:45<00:12, 1.30it/s]
98%|ββββββββββ| 767/782 [09:45<00:11, 1.31it/s]
98%|ββββββββββ| 768/782 [09:46<00:10, 1.29it/s]
98%|ββββββββββ| 769/782 [09:47<00:10, 1.28it/s]
98%|ββββββββββ| 770/782 [09:48<00:09, 1.28it/s]
99%|ββββββββββ| 771/782 [09:49<00:08, 1.29it/s]
99%|ββββββββββ| 772/782 [09:49<00:07, 1.29it/s]
99%|ββββββββββ| 773/782 [09:50<00:06, 1.32it/s]
99%|ββββββββββ| 774/782 [09:51<00:06, 1.30it/s]
99%|ββββββββββ| 775/782 [09:52<00:05, 1.30it/s]
99%|ββββββββββ| 776/782 [09:52<00:04, 1.31it/s]
99%|ββββββββββ| 777/782 [09:53<00:03, 1.31it/s]
99%|ββββββββββ| 778/782 [09:54<00:03, 1.30it/s]
100%|ββββββββββ| 779/782 [09:55<00:02, 1.30it/s]
100%|ββββββββββ| 780/782 [09:55<00:01, 1.31it/s]
100%|ββββββββββ| 781/782 [09:56<00:00, 1.30it/s]
100%|ββββββββββ| 782/782 [09:57<00:00, 1.31it/s]
{'train_runtime': 598.6221, 'train_samples_per_second': 167.05, 'train_steps_per_second': 1.306, 'train_loss': 0.20923083342607027, 'epoch': 1.0} |
|
100%|ββββββββββ| 782/782 [09:57<00:00, 1.31it/s]
100%|ββββββββββ| 782/782 [09:57<00:00, 1.31it/s] |
|
model.safetensors: 0%| | 0.00/2.00G [00:00<?, ?B/s] |
|
tokenizer.model: 0%| | 0.00/4.69M [00:00<?, ?B/s][A |
|
|
|
Upload 3 LFS files: 0%| | 0/3 [00:00<?, ?it/s][A[A |
|
|
|
|
|
training_args.bin: 0%| | 0.00/5.43k [00:00<?, ?B/s][A[A[A
training_args.bin: 100%|ββββββββββ| 5.43k/5.43k [00:00<00:00, 69.7kB/s] |
|
model.safetensors: 0%| | 2.56M/2.00G [00:00<01:18, 25.5MB/s] |
|
tokenizer.model: 6%|β | 279k/4.69M [00:00<00:01, 2.73MB/s][A
model.safetensors: 1%| | 14.6M/2.00G [00:00<00:24, 81.3MB/s]
tokenizer.model: 100%|ββββββββββ| 4.69M/4.69M [00:00<00:00, 14.0MB/s] |
|
model.safetensors: 1%| | 22.7M/2.00G [00:00<00:41, 48.1MB/s]
model.safetensors: 2%|β | 32.0M/2.00G [00:00<00:45, 43.1MB/s]
model.safetensors: 2%|β | 48.0M/2.00G [00:01<01:05, 29.8MB/s]
model.safetensors: 3%|β | 64.0M/2.00G [00:01<00:51, 37.3MB/s]
model.safetensors: 4%|β | 78.1M/2.00G [00:01<00:38, 49.5MB/s]
model.safetensors: 4%|β | 85.4M/2.00G [00:02<00:43, 44.5MB/s]
model.safetensors: 5%|β | 96.0M/2.00G [00:02<00:44, 43.2MB/s]
model.safetensors: 6%|β | 110M/2.00G [00:02<00:32, 57.5MB/s]
model.safetensors: 6%|β | 118M/2.00G [00:02<00:38, 48.3MB/s]
model.safetensors: 6%|β | 128M/2.00G [00:02<00:42, 43.7MB/s]
model.safetensors: 7%|β | 142M/2.00G [00:03<00:32, 57.3MB/s]
model.safetensors: 7%|β | 150M/2.00G [00:03<00:54, 33.9MB/s]
model.safetensors: 8%|β | 160M/2.00G [00:03<00:53, 34.6MB/s]
model.safetensors: 9%|β | 174M/2.00G [00:03<00:38, 47.4MB/s]
model.safetensors: 9%|β | 181M/2.00G [00:04<00:40, 44.4MB/s]
model.safetensors: 10%|β | 192M/2.00G [00:04<00:40, 44.9MB/s]
model.safetensors: 10%|β | 208M/2.00G [00:04<00:37, 47.8MB/s]
model.safetensors: 11%|β | 222M/2.00G [00:04<00:29, 60.5MB/s]
model.safetensors: 11%|ββ | 230M/2.00G [00:05<00:34, 51.4MB/s]
model.safetensors: 12%|ββ | 240M/2.00G [00:05<00:38, 46.1MB/s]
model.safetensors: 13%|ββ | 254M/2.00G [00:05<00:28, 61.2MB/s]
model.safetensors: 13%|ββ | 263M/2.00G [00:05<00:33, 51.2MB/s]
model.safetensors: 14%|ββ | 272M/2.00G [00:05<00:37, 45.8MB/s]
model.safetensors: 14%|ββ | 285M/2.00G [00:06<00:28, 59.7MB/s]
model.safetensors: 15%|ββ | 293M/2.00G [00:06<00:35, 47.7MB/s]
model.safetensors: 15%|ββ | 304M/2.00G [00:06<00:35, 48.0MB/s]
model.safetensors: 16%|ββ | 320M/2.00G [00:06<00:32, 52.3MB/s]
model.safetensors: 17%|ββ | 334M/2.00G [00:06<00:25, 65.7MB/s]
model.safetensors: 17%|ββ | 343M/2.00G [00:07<00:30, 53.7MB/s]
model.safetensors: 18%|ββ | 352M/2.00G [00:07<00:33, 49.6MB/s]
model.safetensors: 18%|ββ | 368M/2.00G [00:07<00:24, 67.4MB/s]
model.safetensors: 19%|ββ | 377M/2.00G [00:07<00:31, 50.9MB/s]
model.safetensors: 19%|ββ | 384M/2.00G [00:08<00:36, 44.4MB/s]
model.safetensors: 20%|ββ | 400M/2.00G [00:08<00:32, 50.0MB/s]
model.safetensors: 21%|ββ | 413M/2.00G [00:08<00:26, 60.7MB/s]
model.safetensors: 21%|ββ | 421M/2.00G [00:08<00:32, 49.3MB/s]
model.safetensors: 22%|βββ | 432M/2.00G [00:08<00:32, 48.2MB/s]
model.safetensors: 22%|βββ | 447M/2.00G [00:09<00:23, 65.1MB/s]
model.safetensors: 23%|βββ | 456M/2.00G [00:09<00:27, 55.5MB/s]
model.safetensors: 23%|βββ | 464M/2.00G [00:09<00:31, 49.4MB/s]
model.safetensors: 24%|βββ | 480M/2.00G [00:09<00:29, 51.3MB/s]
model.safetensors: 25%|βββ | 496M/2.00G [00:10<00:28, 53.4MB/s]
model.safetensors: 26%|βββ | 512M/2.00G [00:10<00:28, 52.0MB/s]
model.safetensors: 26%|βββ | 528M/2.00G [00:10<00:26, 55.2MB/s]
model.safetensors: 27%|βββ | 544M/2.00G [00:10<00:25, 57.0MB/s]
model.safetensors: 28%|βββ | 560M/2.00G [00:11<00:33, 43.2MB/s]
model.safetensors: 29%|βββ | 576M/2.00G [00:11<00:29, 47.6MB/s]
model.safetensors: 30%|βββ | 592M/2.00G [00:11<00:27, 51.1MB/s]
model.safetensors: 30%|βββ | 608M/2.00G [00:12<00:28, 48.2MB/s]
model.safetensors: 31%|βββ | 624M/2.00G [00:12<00:27, 49.8MB/s]
model.safetensors: 32%|ββββ | 640M/2.00G [00:12<00:27, 49.1MB/s]
model.safetensors: 33%|ββββ | 655M/2.00G [00:13<00:22, 60.4MB/s]
model.safetensors: 33%|ββββ | 663M/2.00G [00:13<00:25, 52.1MB/s]
model.safetensors: 34%|ββββ | 672M/2.00G [00:13<00:26, 50.3MB/s]
model.safetensors: 34%|ββββ | 688M/2.00G [00:13<00:19, 66.6MB/s]
model.safetensors: 35%|ββββ | 697M/2.00G [00:13<00:23, 54.7MB/s]
model.safetensors: 35%|ββββ | 704M/2.00G [00:14<00:29, 43.9MB/s]
model.safetensors: 36%|ββββ | 720M/2.00G [00:14<00:27, 46.4MB/s]
model.safetensors: 37%|ββββ | 736M/2.00G [00:14<00:25, 48.8MB/s]
model.safetensors: 38%|ββββ | 752M/2.00G [00:15<00:26, 47.3MB/s]
model.safetensors: 38%|ββββ | 768M/2.00G [00:15<00:26, 47.0MB/s]
model.safetensors: 39%|ββββ | 784M/2.00G [00:15<00:24, 49.9MB/s]
model.safetensors: 40%|ββββ | 800M/2.00G [00:16<00:27, 43.9MB/s]
model.safetensors: 41%|ββββ | 816M/2.00G [00:16<00:24, 48.1MB/s]
model.safetensors: 42%|βββββ | 832M/2.00G [00:16<00:23, 49.4MB/s]
model.safetensors: 42%|βββββ | 848M/2.00G [00:17<00:21, 53.5MB/s]
model.safetensors: 43%|βββββ | 864M/2.00G [00:17<00:24, 46.9MB/s]
model.safetensors: 44%|βββββ | 880M/2.00G [00:17<00:22, 50.3MB/s]
model.safetensors: 45%|βββββ | 896M/2.00G [00:18<00:20, 52.9MB/s]
model.safetensors: 46%|βββββ | 912M/2.00G [00:18<00:19, 55.2MB/s]
model.safetensors: 46%|βββββ | 928M/2.00G [00:18<00:23, 46.0MB/s]
model.safetensors: 47%|βββββ | 944M/2.00G [00:19<00:21, 49.7MB/s]
model.safetensors: 48%|βββββ | 956M/2.00G [00:19<00:17, 58.2MB/s]
model.safetensors: 48%|βββββ | 964M/2.00G [00:19<00:19, 52.4MB/s]
model.safetensors: 49%|βββββ | 976M/2.00G [00:19<00:20, 49.7MB/s]
model.safetensors: 50%|βββββ | 992M/2.00G [00:19<00:20, 48.8MB/s]
model.safetensors: 50%|βββββ | 1.01G/2.00G [00:20<00:19, 51.0MB/s]
model.safetensors: 51%|βββββ | 1.02G/2.00G [00:20<00:18, 52.7MB/s]
model.safetensors: 52%|ββββββ | 1.04G/2.00G [00:20<00:17, 55.8MB/s]
model.safetensors: 53%|ββββββ | 1.06G/2.00G [00:21<00:16, 56.9MB/s]
model.safetensors: 54%|ββββββ | 1.07G/2.00G [00:21<00:13, 70.5MB/s]
model.safetensors: 54%|ββββββ | 1.08G/2.00G [00:21<00:16, 55.9MB/s]
model.safetensors: 54%|ββββββ | 1.09G/2.00G [00:21<00:18, 48.7MB/s]
model.safetensors: 55%|ββββββ | 1.10G/2.00G [00:21<00:15, 56.3MB/s]
model.safetensors: 56%|ββββββ | 1.12G/2.00G [00:22<00:14, 58.8MB/s]
model.safetensors: 57%|ββββββ | 1.14G/2.00G [00:22<00:15, 56.9MB/s]
model.safetensors: 58%|ββββββ | 1.15G/2.00G [00:22<00:12, 70.1MB/s]
model.safetensors: 58%|ββββββ | 1.16G/2.00G [00:22<00:14, 59.6MB/s]
model.safetensors: 58%|ββββββ | 1.17G/2.00G [00:23<00:16, 50.8MB/s]
model.safetensors: 59%|ββββββ | 1.18G/2.00G [00:23<00:15, 51.1MB/s]
model.safetensors: 60%|ββββββ | 1.20G/2.00G [00:23<00:14, 54.0MB/s]
model.safetensors: 61%|ββββββ | 1.21G/2.00G [00:23<00:11, 67.4MB/s]
model.safetensors: 61%|ββββββ | 1.22G/2.00G [00:23<00:13, 57.1MB/s]
model.safetensors: 62%|βββββββ | 1.23G/2.00G [00:24<00:14, 52.4MB/s]
model.safetensors: 62%|βββββββ | 1.25G/2.00G [00:24<00:13, 57.7MB/s]
model.safetensors: 63%|βββββββ | 1.26G/2.00G [00:24<00:12, 60.6MB/s]
model.safetensors: 64%|βββββββ | 1.28G/2.00G [00:24<00:09, 73.8MB/s]
model.safetensors: 64%|βββββββ | 1.29G/2.00G [00:24<00:11, 59.5MB/s]
model.safetensors: 65%|βββββββ | 1.30G/2.00G [00:25<00:13, 52.9MB/s]
model.safetensors: 66%|βββββββ | 1.31G/2.00G [00:25<00:09, 69.1MB/s]
model.safetensors: 66%|βββββββ | 1.32G/2.00G [00:25<00:12, 55.8MB/s]
model.safetensors: 66%|βββββββ | 1.33G/2.00G [00:25<00:14, 46.1MB/s]
model.safetensors: 67%|βββββββ | 1.34G/2.00G [00:26<00:12, 51.5MB/s]
model.safetensors: 68%|βββββββ | 1.36G/2.00G [00:26<00:11, 54.7MB/s]
model.safetensors: 69%|βββββββ | 1.38G/2.00G [00:26<00:11, 55.0MB/s]
model.safetensors: 70%|βββββββ | 1.39G/2.00G [00:26<00:10, 57.8MB/s]
model.safetensors: 70%|βββββββ | 1.41G/2.00G [00:27<00:10, 57.6MB/s]
model.safetensors: 71%|βββββββ | 1.42G/2.00G [00:27<00:10, 57.5MB/s]
model.safetensors: 72%|ββββββββ | 1.44G/2.00G [00:27<00:09, 59.1MB/s]
model.safetensors: 73%|ββββββββ | 1.46G/2.00G [00:27<00:08, 61.0MB/s]
model.safetensors: 74%|ββββββββ | 1.47G/2.00G [00:28<00:08, 59.0MB/s]
model.safetensors: 74%|ββββββββ | 1.49G/2.00G [00:28<00:08, 59.4MB/s]
model.safetensors: 75%|ββββββββ | 1.50G/2.00G [00:28<00:08, 57.6MB/s]
model.safetensors: 76%|ββββββββ | 1.52G/2.00G [00:29<00:08, 56.4MB/s]
model.safetensors: 77%|ββββββββ | 1.54G/2.00G [00:29<00:07, 58.0MB/s]
model.safetensors: 78%|ββββββββ | 1.55G/2.00G [00:29<00:07, 59.0MB/s]
model.safetensors: 78%|ββββββββ | 1.57G/2.00G [00:29<00:07, 60.2MB/s]
model.safetensors: 79%|ββββββββ | 1.58G/2.00G [00:30<00:07, 56.5MB/s]
model.safetensors: 80%|ββββββββ | 1.60G/2.00G [00:30<00:06, 57.6MB/s]
model.safetensors: 81%|ββββββββ | 1.62G/2.00G [00:30<00:06, 58.6MB/s]
model.safetensors: 82%|βββββββββ | 1.63G/2.00G [00:30<00:06, 59.4MB/s]
model.safetensors: 82%|βββββββββ | 1.65G/2.00G [00:31<00:05, 60.1MB/s]
model.safetensors: 83%|βββββββββ | 1.66G/2.00G [00:31<00:05, 62.9MB/s]
model.safetensors: 84%|βββββββββ | 1.68G/2.00G [00:31<00:05, 58.1MB/s]
model.safetensors: 85%|βββββββββ | 1.70G/2.00G [00:31<00:04, 71.0MB/s]
model.safetensors: 85%|βββββββββ | 1.70G/2.00G [00:32<00:05, 57.7MB/s]
model.safetensors: 86%|βββββββββ | 1.71G/2.00G [00:32<00:06, 44.1MB/s]
model.safetensors: 86%|βββββββββ | 1.73G/2.00G [00:32<00:05, 47.6MB/s]
model.safetensors: 87%|βββββββββ | 1.74G/2.00G [00:33<00:05, 50.1MB/s]
model.safetensors: 88%|βββββββββ | 1.76G/2.00G [00:33<00:04, 53.7MB/s]
model.safetensors: 89%|βββββββββ | 1.78G/2.00G [00:34<00:05, 37.5MB/s]
model.safetensors: 90%|βββββββββ | 1.79G/2.00G [00:34<00:04, 41.7MB/s]
model.safetensors: 90%|βββββββββ | 1.81G/2.00G [00:34<00:04, 47.7MB/s]
model.safetensors: 91%|βββββββββ | 1.82G/2.00G [00:34<00:03, 58.3MB/s]
model.safetensors: 92%|ββββββββββ| 1.83G/2.00G [00:34<00:03, 48.4MB/s]
model.safetensors: 92%|ββββββββββ| 1.84G/2.00G [00:35<00:03, 47.0MB/s]
model.safetensors: 93%|ββββββββββ| 1.86G/2.00G [00:35<00:02, 52.8MB/s]
model.safetensors: 94%|ββββββββββ| 1.87G/2.00G [00:35<00:02, 53.9MB/s]
model.safetensors: 94%|ββββββββββ| 1.89G/2.00G [00:35<00:01, 56.8MB/s]
model.safetensors: 95%|ββββββββββ| 1.90G/2.00G [00:36<00:01, 53.3MB/s]
model.safetensors: 96%|ββββββββββ| 1.92G/2.00G [00:36<00:01, 54.4MB/s]
model.safetensors: 97%|ββββββββββ| 1.94G/2.00G [00:36<00:01, 58.0MB/s]
model.safetensors: 98%|ββββββββββ| 1.95G/2.00G [00:37<00:00, 57.8MB/s]
model.safetensors: 98%|ββββββββββ| 1.97G/2.00G [00:37<00:00, 62.3MB/s]
model.safetensors: 99%|ββββββββββ| 1.98G/2.00G [00:37<00:00, 61.9MB/s]
model.safetensors: 100%|ββββββββββ| 2.00G/2.00G [00:37<00:00, 52.8MB/s] |
|
|
|
|
|
Upload 3 LFS files: 33%|ββββ | 1/3 [00:38<01:16, 38.06s/it][A[A
Upload 3 LFS files: 100%|ββββββββββ| 3/3 [00:38<00:00, 12.69s/it] |
|
|