YWZBrandon's picture
End of training
7a5c855 verified
[2025-05-10 11:15:25] Created output directory: train_results_pred_mask/google_gemma-3-1b-pt_ds100_upsample1000_predict_mask
[2025-05-10 11:15:25] Chat mode disabled
[2025-05-10 11:15:25] Model size is 3B or smaller (1 B). Using full fine-tuning.
[2025-05-10 11:15:25] No QA format data will be used
[2025-05-10 11:15:25] Limiting dataset size to: 100 samples
[2025-05-10 11:15:25] =======================================
[2025-05-10 11:15:25] Starting training for model: google/gemma-3-1b-pt
[2025-05-10 11:15:25] =======================================
[2025-05-10 11:15:25] CUDA_VISIBLE_DEVICES: 0,1,2,3
[2025-05-10 11:15:25] WANDB_PROJECT: wikidyk-ar
[2025-05-10 11:15:25] DATA_PATH: data/wikidyk2022-2025_01082025_gpt-4o_evalv2_pages_formatted_combined_v2.json
[2025-05-10 11:15:25] Global Batch Size: 128
[2025-05-10 11:15:25] Data Size: 100
[2025-05-10 11:15:25] Executing command: torchrun --nproc_per_node "4" --master-port 29506 src/train.py --model_name_or_path "google/gemma-3-1b-pt" --data_path "data/wikidyk2022-2025_01082025_gpt-4o_evalv2_pages_formatted_combined_v2.json" --output_dir "train_results_pred_mask/google_gemma-3-1b-pt_ds100_upsample1000_predict_mask" --num_upsample "1000" --per_device_train_batch_size "32" --gradient_accumulation_steps "1" --learning_rate "2e-5" --num_train_epochs "1" --model_max_length "32768" --report_to wandb --logging_steps 50 --save_strategy no --bf16 True --use_flash_attention_2 True --qa_data_ratio "-1" --predict_mask "true" --ds_size 100
[2025-05-10 11:15:25] Training started at Sat May 10 11:15:25 UTC 2025
W0510 11:15:27.277000 361433 site-packages/torch/distributed/run.py:792]
W0510 11:15:27.277000 361433 site-packages/torch/distributed/run.py:792] *****************************************
W0510 11:15:27.277000 361433 site-packages/torch/distributed/run.py:792] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
W0510 11:15:27.277000 361433 site-packages/torch/distributed/run.py:792] *****************************************
WARNING:root:Output directory: train_results_pred_mask/google_gemma-3-1b-pt_ds100_upsample1000_predict_mask
The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use `attn_implementation="flash_attention_2"` instead.
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
WARNING:root:Output directory: train_results_pred_mask/google_gemma-3-1b-pt_ds100_upsample1000_predict_mask
WARNING:root:Output directory: train_results_pred_mask/google_gemma-3-1b-pt_ds100_upsample1000_predict_mask
The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use `attn_implementation="flash_attention_2"` instead.
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
WARNING:root:Output directory: train_results_pred_mask/google_gemma-3-1b-pt_ds100_upsample1000_predict_mask
The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use `attn_implementation="flash_attention_2"` instead.
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use `attn_implementation="flash_attention_2"` instead.
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
WARNING:root:Loading data...
WARNING:root:Loading data...
WARNING:root:Loading data...
WARNING:root:Loading data...
WARNING:root:Dataset initialized with all QA data:
WARNING:root: - 0 QA examples
WARNING:root: - 100 fact examples with upsampling factor 1000
WARNING:root: - Total examples: 100000
/root/WikiDYKEvalV2/src/train.py:119: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead.
trainer = Trainer(model=model, tokenizer=tokenizer, args=training_args, **data_module)
WARNING:root:Dataset initialized with all QA data:
WARNING:root: - 0 QA examples
WARNING:root: - 100 fact examples with upsampling factor 1000
WARNING:root: - Total examples: 100000
/root/WikiDYKEvalV2/src/train.py:119: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead.
trainer = Trainer(model=model, tokenizer=tokenizer, args=training_args, **data_module)
WARNING:root:Dataset initialized with all QA data:
WARNING:root: - 0 QA examples
WARNING:root: - 100 fact examples with upsampling factor 1000
WARNING:root: - Total examples: 100000
/root/WikiDYKEvalV2/src/train.py:119: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead.
trainer = Trainer(model=model, tokenizer=tokenizer, args=training_args, **data_module)
WARNING:root:Dataset initialized with all QA data:
WARNING:root: - 0 QA examples
WARNING:root: - 100 fact examples with upsampling factor 1000
WARNING:root: - Total examples: 100000
/root/WikiDYKEvalV2/src/train.py:119: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead.
trainer = Trainer(model=model, tokenizer=tokenizer, args=training_args, **data_module)
wandb: WARNING The `run_name` is currently set to the same value as `TrainingArguments.output_dir`. If this was not intended, please specify a different run name by setting the `TrainingArguments.run_name` parameter.
It is strongly recommended to train Gemma3 models with the `eager` attention implementation instead of `flash_attention_2`. Use `eager` with `AutoModelForCausalLM.from_pretrained('<path-to-checkpoint>', attn_implementation='eager')`.
It is strongly recommended to train Gemma3 models with the `eager` attention implementation instead of `flash_attention_2`. Use `eager` with `AutoModelForCausalLM.from_pretrained('<path-to-checkpoint>', attn_implementation='eager')`.
It is strongly recommended to train Gemma3 models with the `eager` attention implementation instead of `flash_attention_2`. Use `eager` with `AutoModelForCausalLM.from_pretrained('<path-to-checkpoint>', attn_implementation='eager')`.
wandb: Currently logged in as: yuweiz to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
wandb: Tracking run with wandb version 0.19.10
wandb: Run data is saved locally in /root/WikiDYKEvalV2/wandb/run-20250510_111541-7crva42l
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run train_results_pred_mask/google_gemma-3-1b-pt_ds100_upsample1000_predict_mask
wandb: ⭐️ View project at https://wandb.ai/yuweiz/wikidyk-ar
wandb: πŸš€ View run at https://wandb.ai/yuweiz/wikidyk-ar/runs/7crva42l
0%| | 0/782 [00:00<?, ?it/s]It is strongly recommended to train Gemma3 models with the `eager` attention implementation instead of `flash_attention_2`. Use `eager` with `AutoModelForCausalLM.from_pretrained('<path-to-checkpoint>', attn_implementation='eager')`.
[rank1]:[W510 11:15:42.657900696 reducer.cpp:1400] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[rank2]:[W510 11:15:42.662607711 reducer.cpp:1400] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[rank3]:[W510 11:15:42.675341163 reducer.cpp:1400] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[rank0]:[W510 11:15:42.736902974 reducer.cpp:1400] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
0%| | 1/782 [00:01<17:43, 1.36s/it] 0%| | 2/782 [00:02<12:40, 1.03it/s] 0%| | 3/782 [00:02<11:14, 1.15it/s] 1%| | 4/782 [00:03<10:49, 1.20it/s] 1%| | 5/782 [00:04<10:32, 1.23it/s] 1%| | 6/782 [00:05<10:16, 1.26it/s] 1%| | 7/782 [00:05<10:06, 1.28it/s] 1%| | 8/782 [00:06<10:01, 1.29it/s] 1%| | 9/782 [00:07<09:53, 1.30it/s] 1%|▏ | 10/782 [00:08<09:54, 1.30it/s] 1%|▏ | 11/782 [00:08<09:53, 1.30it/s] 2%|▏ | 12/782 [00:09<09:44, 1.32it/s] 2%|▏ | 13/782 [00:10<09:49, 1.31it/s] 2%|▏ | 14/782 [00:11<09:46, 1.31it/s] 2%|▏ | 15/782 [00:11<09:44, 1.31it/s] 2%|▏ | 16/782 [00:12<09:41, 1.32it/s] 2%|▏ | 17/782 [00:13<09:42, 1.31it/s] 2%|▏ | 18/782 [00:14<09:32, 1.34it/s] 2%|▏ | 19/782 [00:14<09:38, 1.32it/s] 3%|β–Ž | 20/782 [00:15<09:37, 1.32it/s] 3%|β–Ž | 21/782 [00:16<09:24, 1.35it/s] 3%|β–Ž | 22/782 [00:17<09:21, 1.35it/s] 3%|β–Ž | 23/782 [00:17<09:26, 1.34it/s] 3%|β–Ž | 24/782 [00:18<09:31, 1.33it/s] 3%|β–Ž | 25/782 [00:19<10:07, 1.25it/s] 3%|β–Ž | 26/782 [00:20<10:07, 1.24it/s] 3%|β–Ž | 27/782 [00:21<10:01, 1.26it/s] 4%|β–Ž | 28/782 [00:22<09:57, 1.26it/s] 4%|β–Ž | 29/782 [00:22<09:55, 1.26it/s] 4%|▍ | 30/782 [00:23<09:47, 1.28it/s] 4%|▍ | 31/782 [00:24<10:49, 1.16it/s] 4%|▍ | 32/782 [00:25<10:28, 1.19it/s] 4%|▍ | 33/782 [00:26<10:11, 1.22it/s] 4%|▍ | 34/782 [00:26<09:47, 1.27it/s] 4%|▍ | 35/782 [00:27<09:43, 1.28it/s] 5%|▍ | 36/782 [00:28<09:44, 1.28it/s] 5%|▍ | 37/782 [00:29<09:43, 1.28it/s] 5%|▍ | 38/782 [00:29<09:39, 1.28it/s] 5%|▍ | 39/782 [00:30<09:34, 1.29it/s] 5%|β–Œ | 40/782 [00:31<09:33, 1.29it/s] 5%|β–Œ | 41/782 [00:32<09:25, 1.31it/s] 5%|β–Œ | 42/782 [00:33<09:24, 1.31it/s] 5%|β–Œ | 43/782 [00:33<09:25, 1.31it/s] 6%|β–Œ | 44/782 [00:34<09:29, 1.30it/s] 6%|β–Œ | 45/782 [00:35<09:32, 1.29it/s] 6%|β–Œ | 46/782 [00:36<09:26, 1.30it/s] 6%|β–Œ | 47/782 [00:36<09:31, 1.29it/s] 6%|β–Œ | 48/782 [00:37<09:30, 1.29it/s] 6%|β–‹ | 49/782 [00:38<09:21, 1.30it/s] 6%|β–‹ | 50/782 [00:39<09:22, 1.30it/s] {'loss': 3.1431, 'grad_norm': 8.1875, 'learning_rate': 1.874680306905371e-05, 'epoch': 0.06}
6%|β–‹ | 50/782 [00:39<09:22, 1.30it/s] 7%|β–‹ | 51/782 [00:39<09:18, 1.31it/s] 7%|β–‹ | 52/782 [00:40<09:20, 1.30it/s] 7%|β–‹ | 53/782 [00:41<09:23, 1.29it/s] 7%|β–‹ | 54/782 [00:42<09:12, 1.32it/s] 7%|β–‹ | 55/782 [00:43<09:15, 1.31it/s] 7%|β–‹ | 56/782 [00:43<09:05, 1.33it/s] 7%|β–‹ | 57/782 [00:44<09:06, 1.33it/s] 7%|β–‹ | 58/782 [00:45<09:10, 1.32it/s] 8%|β–Š | 59/782 [00:46<09:10, 1.31it/s] 8%|β–Š | 60/782 [00:46<09:15, 1.30it/s] 8%|β–Š | 61/782 [00:47<09:18, 1.29it/s] 8%|β–Š | 62/782 [00:48<09:11, 1.31it/s] 8%|β–Š | 63/782 [00:49<09:04, 1.32it/s] 8%|β–Š | 64/782 [00:49<09:03, 1.32it/s] 8%|β–Š | 65/782 [00:50<09:12, 1.30it/s] 8%|β–Š | 66/782 [00:51<09:05, 1.31it/s] 9%|β–Š | 67/782 [00:52<09:06, 1.31it/s] 9%|β–Š | 68/782 [00:52<08:59, 1.32it/s] 9%|β–‰ | 69/782 [00:53<08:52, 1.34it/s] 9%|β–‰ | 70/782 [00:54<08:53, 1.34it/s] 9%|β–‰ | 71/782 [00:55<08:55, 1.33it/s] 9%|β–‰ | 72/782 [00:55<09:00, 1.31it/s] 9%|β–‰ | 73/782 [00:56<09:03, 1.30it/s] 9%|β–‰ | 74/782 [00:57<09:05, 1.30it/s] 10%|β–‰ | 75/782 [00:58<08:57, 1.31it/s] 10%|β–‰ | 76/782 [00:59<09:05, 1.29it/s] 10%|β–‰ | 77/782 [00:59<09:03, 1.30it/s] 10%|β–‰ | 78/782 [01:00<09:02, 1.30it/s] 10%|β–ˆ | 79/782 [01:01<09:02, 1.30it/s] 10%|β–ˆ | 80/782 [01:02<09:04, 1.29it/s] 10%|β–ˆ | 81/782 [01:02<08:59, 1.30it/s] 10%|β–ˆ | 82/782 [01:03<08:57, 1.30it/s] 11%|β–ˆ | 83/782 [01:04<08:58, 1.30it/s] 11%|β–ˆ | 84/782 [01:05<08:54, 1.31it/s] 11%|β–ˆ | 85/782 [01:05<08:58, 1.29it/s] 11%|β–ˆ | 86/782 [01:06<09:06, 1.27it/s] 11%|β–ˆ | 87/782 [01:07<08:57, 1.29it/s] 11%|β–ˆβ– | 88/782 [01:08<08:52, 1.30it/s] 11%|β–ˆβ– | 89/782 [01:09<08:48, 1.31it/s] 12%|β–ˆβ– | 90/782 [01:09<08:55, 1.29it/s] 12%|β–ˆβ– | 91/782 [01:10<08:54, 1.29it/s] 12%|β–ˆβ– | 92/782 [01:11<08:54, 1.29it/s] 12%|β–ˆβ– | 93/782 [01:12<08:54, 1.29it/s] 12%|β–ˆβ– | 94/782 [01:12<08:55, 1.29it/s] 12%|β–ˆβ– | 95/782 [01:13<08:50, 1.30it/s] 12%|β–ˆβ– | 96/782 [01:14<08:52, 1.29it/s] 12%|β–ˆβ– | 97/782 [01:15<08:49, 1.29it/s] 13%|β–ˆβ–Ž | 98/782 [01:16<08:50, 1.29it/s] 13%|β–ˆβ–Ž | 99/782 [01:16<08:49, 1.29it/s] 13%|β–ˆβ–Ž | 100/782 [01:17<08:48, 1.29it/s] 13%|β–ˆβ–Ž | 100/782 [01:17<08:48, 1.29it/s]{'loss': 0.1045, 'grad_norm': 3.90625, 'learning_rate': 1.7468030690537086e-05, 'epoch': 0.13}
13%|β–ˆβ–Ž | 101/782 [01:18<08:47, 1.29it/s] 13%|β–ˆβ–Ž | 102/782 [01:19<08:41, 1.30it/s] 13%|β–ˆβ–Ž | 103/782 [01:19<08:45, 1.29it/s] 13%|β–ˆβ–Ž | 104/782 [01:20<08:38, 1.31it/s] 13%|β–ˆβ–Ž | 105/782 [01:21<08:28, 1.33it/s] 14%|β–ˆβ–Ž | 106/782 [01:22<08:36, 1.31it/s] 14%|β–ˆβ–Ž | 107/782 [01:22<08:37, 1.30it/s] 14%|β–ˆβ– | 108/782 [01:23<08:30, 1.32it/s] 14%|β–ˆβ– | 109/782 [01:24<08:29, 1.32it/s] 14%|β–ˆβ– | 110/782 [01:25<08:35, 1.30it/s] 14%|β–ˆβ– | 111/782 [01:25<08:25, 1.33it/s] 14%|β–ˆβ– | 112/782 [01:26<08:18, 1.35it/s] 14%|β–ˆβ– | 113/782 [01:27<08:11, 1.36it/s] 15%|β–ˆβ– | 114/782 [01:28<08:13, 1.35it/s] 15%|β–ˆβ– | 115/782 [01:28<08:11, 1.36it/s] 15%|β–ˆβ– | 116/782 [01:29<08:16, 1.34it/s] 15%|β–ˆβ– | 117/782 [01:30<08:10, 1.36it/s] 15%|β–ˆβ–Œ | 118/782 [01:31<08:12, 1.35it/s] 15%|β–ˆβ–Œ | 119/782 [01:31<08:10, 1.35it/s] 15%|β–ˆβ–Œ | 120/782 [01:32<08:20, 1.32it/s] 15%|β–ˆβ–Œ | 121/782 [01:33<08:16, 1.33it/s] 16%|β–ˆβ–Œ | 122/782 [01:34<08:17, 1.33it/s] 16%|β–ˆβ–Œ | 123/782 [01:34<08:22, 1.31it/s] 16%|β–ˆβ–Œ | 124/782 [01:35<08:26, 1.30it/s] 16%|β–ˆβ–Œ | 125/782 [01:36<08:20, 1.31it/s] 16%|β–ˆβ–Œ | 126/782 [01:37<08:23, 1.30it/s] 16%|β–ˆβ–Œ | 127/782 [01:37<08:17, 1.32it/s] 16%|β–ˆβ–‹ | 128/782 [01:38<08:20, 1.31it/s] 16%|β–ˆβ–‹ | 129/782 [01:39<08:10, 1.33it/s] 17%|β–ˆβ–‹ | 130/782 [01:40<08:06, 1.34it/s] 17%|β–ˆβ–‹ | 131/782 [01:40<08:14, 1.32it/s] 17%|β–ˆβ–‹ | 132/782 [01:41<08:05, 1.34it/s] 17%|β–ˆβ–‹ | 133/782 [01:42<08:13, 1.32it/s] 17%|β–ˆβ–‹ | 134/782 [01:43<08:10, 1.32it/s] 17%|β–ˆβ–‹ | 135/782 [01:43<08:07, 1.33it/s] 17%|β–ˆβ–‹ | 136/782 [01:44<08:09, 1.32it/s] 18%|β–ˆβ–Š | 137/782 [01:45<08:11, 1.31it/s] 18%|β–ˆβ–Š | 138/782 [01:46<08:17, 1.29it/s] 18%|β–ˆβ–Š | 139/782 [01:47<08:17, 1.29it/s] 18%|β–ˆβ–Š | 140/782 [01:47<08:15, 1.30it/s] 18%|β–ˆβ–Š | 141/782 [01:48<08:11, 1.30it/s] 18%|β–ˆβ–Š | 142/782 [01:49<08:10, 1.30it/s] 18%|β–ˆβ–Š | 143/782 [01:50<08:13, 1.30it/s] 18%|β–ˆβ–Š | 144/782 [01:50<08:15, 1.29it/s] 19%|β–ˆβ–Š | 145/782 [01:51<08:04, 1.32it/s] 19%|β–ˆβ–Š | 146/782 [01:52<07:56, 1.33it/s] 19%|β–ˆβ–‰ | 147/782 [01:53<07:57, 1.33it/s] 19%|β–ˆβ–‰ | 148/782 [01:53<07:59, 1.32it/s] 19%|β–ˆβ–‰ | 149/782 [01:54<08:00, 1.32it/s] 19%|β–ˆβ–‰ | 150/782 [01:55<08:00, 1.31it/s] {'loss': 0.0137, 'grad_norm': 0.94921875, 'learning_rate': 1.6189258312020462e-05, 'epoch': 0.19}
19%|β–ˆβ–‰ | 150/782 [01:55<08:00, 1.31it/s] 19%|β–ˆβ–‰ | 151/782 [01:56<08:08, 1.29it/s] 19%|β–ˆβ–‰ | 152/782 [01:56<08:03, 1.30it/s] 20%|β–ˆβ–‰ | 153/782 [01:57<08:06, 1.29it/s] 20%|β–ˆβ–‰ | 154/782 [01:58<08:04, 1.30it/s] 20%|β–ˆβ–‰ | 155/782 [01:59<08:03, 1.30it/s] 20%|β–ˆβ–‰ | 156/782 [02:00<08:03, 1.29it/s] 20%|β–ˆβ–ˆ | 157/782 [02:00<08:04, 1.29it/s] 20%|β–ˆβ–ˆ | 158/782 [02:01<07:57, 1.31it/s] 20%|β–ˆβ–ˆ | 159/782 [02:02<07:56, 1.31it/s] 20%|β–ˆβ–ˆ | 160/782 [02:03<07:52, 1.32it/s] 21%|β–ˆβ–ˆ | 161/782 [02:03<07:47, 1.33it/s] 21%|β–ˆβ–ˆ | 162/782 [02:04<07:43, 1.34it/s] 21%|β–ˆβ–ˆ | 163/782 [02:05<07:40, 1.34it/s] 21%|β–ˆβ–ˆ | 164/782 [02:06<07:47, 1.32it/s] 21%|β–ˆβ–ˆ | 165/782 [02:06<07:47, 1.32it/s] 21%|β–ˆβ–ˆ | 166/782 [02:07<07:47, 1.32it/s] 21%|β–ˆβ–ˆβ– | 167/782 [02:08<07:44, 1.32it/s] 21%|β–ˆβ–ˆβ– | 168/782 [02:09<07:37, 1.34it/s] 22%|β–ˆβ–ˆβ– | 169/782 [02:09<07:30, 1.36it/s] 22%|β–ˆβ–ˆβ– | 170/782 [02:10<07:28, 1.37it/s] 22%|β–ˆβ–ˆβ– | 171/782 [02:11<07:37, 1.33it/s] 22%|β–ˆβ–ˆβ– | 172/782 [02:12<07:40, 1.32it/s] 22%|β–ˆβ–ˆβ– | 173/782 [02:12<07:43, 1.31it/s] 22%|β–ˆβ–ˆβ– | 174/782 [02:13<07:39, 1.32it/s] 22%|β–ˆβ–ˆβ– | 175/782 [02:14<07:43, 1.31it/s] 23%|β–ˆβ–ˆβ–Ž | 176/782 [02:15<07:42, 1.31it/s] 23%|β–ˆβ–ˆβ–Ž | 177/782 [02:15<07:44, 1.30it/s] 23%|β–ˆβ–ˆβ–Ž | 178/782 [02:16<07:39, 1.31it/s] 23%|β–ˆβ–ˆβ–Ž | 179/782 [02:17<07:37, 1.32it/s] 23%|β–ˆβ–ˆβ–Ž | 180/782 [02:18<07:32, 1.33it/s] 23%|β–ˆβ–ˆβ–Ž | 181/782 [02:18<07:29, 1.34it/s] 23%|β–ˆβ–ˆβ–Ž | 182/782 [02:19<07:26, 1.34it/s] 23%|β–ˆβ–ˆβ–Ž | 183/782 [02:20<07:30, 1.33it/s] 24%|β–ˆβ–ˆβ–Ž | 184/782 [02:21<07:31, 1.32it/s] 24%|β–ˆβ–ˆβ–Ž | 185/782 [02:21<07:40, 1.30it/s] 24%|β–ˆβ–ˆβ– | 186/782 [02:22<07:40, 1.29it/s] 24%|β–ˆβ–ˆβ– | 187/782 [02:23<07:36, 1.30it/s] 24%|β–ˆβ–ˆβ– | 188/782 [02:24<07:36, 1.30it/s] 24%|β–ˆβ–ˆβ– | 189/782 [02:25<07:36, 1.30it/s] 24%|β–ˆβ–ˆβ– | 190/782 [02:25<07:37, 1.29it/s] 24%|β–ˆβ–ˆβ– | 191/782 [02:26<07:39, 1.29it/s] 25%|β–ˆβ–ˆβ– | 192/782 [02:27<07:28, 1.32it/s] 25%|β–ˆβ–ˆβ– | 193/782 [02:28<07:25, 1.32it/s] 25%|β–ˆβ–ˆβ– | 194/782 [02:28<07:28, 1.31it/s] 25%|β–ˆβ–ˆβ– | 195/782 [02:29<07:27, 1.31it/s] 25%|β–ˆβ–ˆβ–Œ | 196/782 [02:30<07:27, 1.31it/s] 25%|β–ˆβ–ˆβ–Œ | 197/782 [02:31<07:30, 1.30it/s] 25%|β–ˆβ–ˆβ–Œ | 198/782 [02:31<07:31, 1.29it/s] 25%|β–ˆβ–ˆβ–Œ | 199/782 [02:32<07:28, 1.30it/s] 26%|β–ˆβ–ˆβ–Œ | 200/782 [02:33<07:27, 1.30it/s] {'loss': 0.0047, 'grad_norm': 0.6484375, 'learning_rate': 1.4910485933503838e-05, 'epoch': 0.26}
26%|β–ˆβ–ˆβ–Œ | 200/782 [02:33<07:27, 1.30it/s] 26%|β–ˆβ–ˆβ–Œ | 201/782 [02:34<07:23, 1.31it/s] 26%|β–ˆβ–ˆβ–Œ | 202/782 [02:35<07:21, 1.31it/s] 26%|β–ˆβ–ˆβ–Œ | 203/782 [02:35<07:19, 1.32it/s] 26%|β–ˆβ–ˆβ–Œ | 204/782 [02:36<07:23, 1.30it/s] 26%|β–ˆβ–ˆβ–Œ | 205/782 [02:37<07:24, 1.30it/s] 26%|β–ˆβ–ˆβ–‹ | 206/782 [02:38<07:25, 1.29it/s] 26%|β–ˆβ–ˆβ–‹ | 207/782 [02:38<07:21, 1.30it/s] 27%|β–ˆβ–ˆβ–‹ | 208/782 [02:39<07:20, 1.30it/s] 27%|β–ˆβ–ˆβ–‹ | 209/782 [02:40<07:12, 1.33it/s] 27%|β–ˆβ–ˆβ–‹ | 210/782 [02:41<07:10, 1.33it/s] 27%|β–ˆβ–ˆβ–‹ | 211/782 [02:41<07:12, 1.32it/s] 27%|β–ˆβ–ˆβ–‹ | 212/782 [02:42<07:06, 1.34it/s] 27%|β–ˆβ–ˆβ–‹ | 213/782 [02:43<07:09, 1.32it/s] 27%|β–ˆβ–ˆβ–‹ | 214/782 [02:44<07:10, 1.32it/s] 27%|β–ˆβ–ˆβ–‹ | 215/782 [02:44<07:08, 1.32it/s] 28%|β–ˆβ–ˆβ–Š | 216/782 [02:45<07:08, 1.32it/s] 28%|β–ˆβ–ˆβ–Š | 217/782 [02:46<07:04, 1.33it/s] 28%|β–ˆβ–ˆβ–Š | 218/782 [02:47<07:10, 1.31it/s] 28%|β–ˆβ–ˆβ–Š | 219/782 [02:47<07:14, 1.30it/s] 28%|β–ˆβ–ˆβ–Š | 220/782 [02:48<07:12, 1.30it/s] 28%|β–ˆβ–ˆβ–Š | 221/782 [02:49<07:14, 1.29it/s] 28%|β–ˆβ–ˆβ–Š | 222/782 [02:50<07:09, 1.30it/s] 29%|β–ˆβ–ˆβ–Š | 223/782 [02:51<07:04, 1.32it/s] 29%|β–ˆβ–ˆβ–Š | 224/782 [02:51<07:07, 1.30it/s] 29%|β–ˆβ–ˆβ–‰ | 225/782 [02:52<07:05, 1.31it/s] 29%|β–ˆβ–ˆβ–‰ | 226/782 [02:53<07:00, 1.32it/s] 29%|β–ˆβ–ˆβ–‰ | 227/782 [02:54<07:02, 1.31it/s] 29%|β–ˆβ–ˆβ–‰ | 228/782 [02:54<07:02, 1.31it/s] 29%|β–ˆβ–ˆβ–‰ | 229/782 [02:55<07:03, 1.31it/s] 29%|β–ˆβ–ˆβ–‰ | 230/782 [02:56<06:53, 1.33it/s] 30%|β–ˆβ–ˆβ–‰ | 231/782 [02:57<06:57, 1.32it/s] 30%|β–ˆβ–ˆβ–‰ | 232/782 [02:57<06:56, 1.32it/s] 30%|β–ˆβ–ˆβ–‰ | 233/782 [02:58<07:03, 1.30it/s] 30%|β–ˆβ–ˆβ–‰ | 234/782 [02:59<06:59, 1.31it/s] 30%|β–ˆβ–ˆβ–ˆ | 235/782 [03:00<07:00, 1.30it/s] 30%|β–ˆβ–ˆβ–ˆ | 236/782 [03:00<07:02, 1.29it/s] 30%|β–ˆβ–ˆβ–ˆ | 237/782 [03:01<07:02, 1.29it/s] 30%|β–ˆβ–ˆβ–ˆ | 238/782 [03:02<07:01, 1.29it/s] 31%|β–ˆβ–ˆβ–ˆ | 239/782 [03:03<06:55, 1.31it/s] 31%|β–ˆβ–ˆβ–ˆ | 240/782 [03:04<06:57, 1.30it/s] 31%|β–ˆβ–ˆβ–ˆ | 241/782 [03:04<07:02, 1.28it/s] 31%|β–ˆβ–ˆβ–ˆ | 242/782 [03:05<06:58, 1.29it/s] 31%|β–ˆβ–ˆβ–ˆ | 243/782 [03:06<06:50, 1.31it/s] 31%|β–ˆβ–ˆβ–ˆ | 244/782 [03:07<06:52, 1.30it/s] 31%|β–ˆβ–ˆβ–ˆβ– | 245/782 [03:07<06:44, 1.33it/s] 31%|β–ˆβ–ˆβ–ˆβ– | 246/782 [03:08<06:44, 1.32it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 247/782 [03:09<06:50, 1.30it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 248/782 [03:10<06:49, 1.30it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 249/782 [03:10<06:49, 1.30it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 250/782 [03:11<06:45, 1.31it/s] {'loss': 0.0028, 'grad_norm': 1.3984375, 'learning_rate': 1.3631713554987214e-05, 'epoch': 0.32}
32%|β–ˆβ–ˆβ–ˆβ– | 250/782 [03:11<06:45, 1.31it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 251/782 [03:12<06:43, 1.32it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 252/782 [03:13<06:40, 1.32it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 253/782 [03:13<06:42, 1.31it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 254/782 [03:14<06:47, 1.30it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 255/782 [03:15<06:38, 1.32it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 256/782 [03:16<06:37, 1.32it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 257/782 [03:16<06:38, 1.32it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 258/782 [03:17<06:44, 1.30it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 259/782 [03:18<06:41, 1.30it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 260/782 [03:19<06:41, 1.30it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 261/782 [03:20<06:38, 1.31it/s] 34%|β–ˆβ–ˆβ–ˆβ–Ž | 262/782 [03:20<06:31, 1.33it/s] 34%|β–ˆβ–ˆβ–ˆβ–Ž | 263/782 [03:21<06:26, 1.34it/s] 34%|β–ˆβ–ˆβ–ˆβ– | 264/782 [03:22<06:30, 1.33it/s] 34%|β–ˆβ–ˆβ–ˆβ– | 265/782 [03:23<06:26, 1.34it/s] 34%|β–ˆβ–ˆβ–ˆβ– | 266/782 [03:23<06:24, 1.34it/s] 34%|β–ˆβ–ˆβ–ˆβ– | 267/782 [03:24<06:20, 1.35it/s] 34%|β–ˆβ–ˆβ–ˆβ– | 268/782 [03:25<06:21, 1.35it/s] 34%|β–ˆβ–ˆβ–ˆβ– | 269/782 [03:26<06:26, 1.33it/s] 35%|β–ˆβ–ˆβ–ˆβ– | 270/782 [03:26<06:26, 1.33it/s] 35%|β–ˆβ–ˆβ–ˆβ– | 271/782 [03:27<06:22, 1.34it/s] 35%|β–ˆβ–ˆβ–ˆβ– | 272/782 [03:28<06:26, 1.32it/s] 35%|β–ˆβ–ˆβ–ˆβ– | 273/782 [03:29<06:26, 1.32it/s] 35%|β–ˆβ–ˆβ–ˆβ–Œ | 274/782 [03:29<06:33, 1.29it/s] 35%|β–ˆβ–ˆβ–ˆβ–Œ | 275/782 [03:30<06:23, 1.32it/s] 35%|β–ˆβ–ˆβ–ˆβ–Œ | 276/782 [03:31<06:23, 1.32it/s] 35%|β–ˆβ–ˆβ–ˆβ–Œ | 277/782 [03:32<06:23, 1.32it/s] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 278/782 [03:32<06:22, 1.32it/s] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 279/782 [03:33<06:22, 1.32it/s] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 280/782 [03:34<06:24, 1.31it/s] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 281/782 [03:35<06:18, 1.32it/s] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 282/782 [03:35<06:18, 1.32it/s] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 283/782 [03:36<06:21, 1.31it/s] 36%|β–ˆβ–ˆβ–ˆβ–‹ | 284/782 [03:37<06:21, 1.30it/s] 36%|β–ˆβ–ˆβ–ˆβ–‹ | 285/782 [03:38<06:19, 1.31it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 286/782 [03:38<06:16, 1.32it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 287/782 [03:39<06:15, 1.32it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 288/782 [03:40<06:16, 1.31it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 289/782 [03:41<06:15, 1.31it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 290/782 [03:42<06:15, 1.31it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 291/782 [03:42<06:13, 1.31it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 292/782 [03:43<06:11, 1.32it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 293/782 [03:44<06:10, 1.32it/s] 38%|β–ˆβ–ˆβ–ˆβ–Š | 294/782 [03:45<06:10, 1.32it/s] 38%|β–ˆβ–ˆβ–ˆβ–Š | 295/782 [03:45<06:09, 1.32it/s] 38%|β–ˆβ–ˆβ–ˆβ–Š | 296/782 [03:46<06:13, 1.30it/s] 38%|β–ˆβ–ˆβ–ˆβ–Š | 297/782 [03:47<06:10, 1.31it/s] 38%|β–ˆβ–ˆβ–ˆβ–Š | 298/782 [03:48<06:09, 1.31it/s] 38%|β–ˆβ–ˆβ–ˆβ–Š | 299/782 [03:48<06:02, 1.33it/s] 38%|β–ˆβ–ˆβ–ˆβ–Š | 300/782 [03:49<06:04, 1.32it/s] {'loss': 0.0013, 'grad_norm': 1.171875, 'learning_rate': 1.235294117647059e-05, 'epoch': 0.38}
38%|β–ˆβ–ˆβ–ˆβ–Š | 300/782 [03:49<06:04, 1.32it/s] 38%|β–ˆβ–ˆβ–ˆβ–Š | 301/782 [03:50<06:03, 1.32it/s] 39%|β–ˆβ–ˆβ–ˆβ–Š | 302/782 [03:51<06:04, 1.32it/s] 39%|β–ˆβ–ˆβ–ˆβ–Š | 303/782 [03:51<05:59, 1.33it/s] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 304/782 [03:52<06:07, 1.30it/s] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 305/782 [03:53<06:05, 1.30it/s] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 306/782 [03:54<06:04, 1.30it/s] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 307/782 [03:54<06:01, 1.32it/s] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 308/782 [03:55<05:58, 1.32it/s] 40%|β–ˆβ–ˆβ–ˆβ–‰ | 309/782 [03:56<05:59, 1.32it/s] 40%|β–ˆβ–ˆβ–ˆβ–‰ | 310/782 [03:57<05:54, 1.33it/s] 40%|β–ˆβ–ˆβ–ˆβ–‰ | 311/782 [03:57<05:50, 1.34it/s] 40%|β–ˆβ–ˆβ–ˆβ–‰ | 312/782 [03:58<05:53, 1.33it/s] 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 313/782 [03:59<05:56, 1.31it/s] 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 314/782 [04:00<05:54, 1.32it/s] 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 315/782 [04:00<05:54, 1.32it/s] 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 316/782 [04:01<05:53, 1.32it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 317/782 [04:02<05:57, 1.30it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 318/782 [04:03<05:56, 1.30it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 319/782 [04:04<05:57, 1.29it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 320/782 [04:04<05:56, 1.30it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 321/782 [04:05<05:55, 1.30it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 322/782 [04:06<05:50, 1.31it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 323/782 [04:07<05:51, 1.30it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 324/782 [04:07<05:49, 1.31it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 325/782 [04:08<05:47, 1.31it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 326/782 [04:09<05:43, 1.33it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 327/782 [04:10<05:38, 1.34it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 328/782 [04:10<05:41, 1.33it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 329/782 [04:11<05:42, 1.32it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 330/782 [04:12<05:42, 1.32it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 331/782 [04:13<05:41, 1.32it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 332/782 [04:13<05:43, 1.31it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 333/782 [04:14<05:40, 1.32it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 334/782 [04:15<05:43, 1.30it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 335/782 [04:16<05:41, 1.31it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 336/782 [04:17<05:42, 1.30it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 337/782 [04:17<05:41, 1.30it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 338/782 [04:18<05:38, 1.31it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 339/782 [04:19<05:34, 1.32it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 340/782 [04:20<05:33, 1.32it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 341/782 [04:20<05:36, 1.31it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 342/782 [04:21<05:31, 1.33it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 343/782 [04:22<05:34, 1.31it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 344/782 [04:23<05:33, 1.31it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 345/782 [04:23<05:32, 1.31it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 346/782 [04:24<05:31, 1.31it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 347/782 [04:25<05:27, 1.33it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 348/782 [04:26<05:23, 1.34it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 349/782 [04:26<05:25, 1.33it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 350/782 [04:27<05:23, 1.34it/s] {'loss': 0.0007, 'grad_norm': 0.1298828125, 'learning_rate': 1.1074168797953967e-05, 'epoch': 0.45}
45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 350/782 [04:27<05:23, 1.34it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 351/782 [04:28<05:30, 1.31it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 352/782 [04:29<05:29, 1.31it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 353/782 [04:29<05:28, 1.31it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 354/782 [04:30<05:28, 1.30it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 355/782 [04:31<05:22, 1.32it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 356/782 [04:32<05:21, 1.33it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 357/782 [04:32<05:20, 1.32it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 358/782 [04:33<05:17, 1.34it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 359/782 [04:34<05:17, 1.33it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 360/782 [04:35<05:16, 1.33it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 361/782 [04:35<05:15, 1.33it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 362/782 [04:36<05:14, 1.34it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 363/782 [04:37<05:12, 1.34it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 364/782 [04:38<05:09, 1.35it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 365/782 [04:38<05:14, 1.32it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 366/782 [04:39<05:14, 1.32it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 367/782 [04:40<05:15, 1.31it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 368/782 [04:41<05:15, 1.31it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 369/782 [04:41<05:17, 1.30it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 370/782 [04:42<05:13, 1.31it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 371/782 [04:43<05:13, 1.31it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 372/782 [04:44<05:14, 1.31it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 373/782 [04:45<05:13, 1.30it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 374/782 [04:45<05:13, 1.30it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 375/782 [04:46<05:13, 1.30it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 376/782 [04:47<05:11, 1.30it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 377/782 [04:48<05:11, 1.30it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 378/782 [04:48<05:07, 1.31it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 379/782 [04:49<05:09, 1.30it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 380/782 [04:50<05:03, 1.32it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 381/782 [04:51<05:05, 1.31it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 382/782 [04:51<05:06, 1.30it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 383/782 [04:52<05:03, 1.32it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 384/782 [04:53<05:04, 1.31it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 385/782 [04:54<05:04, 1.30it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 386/782 [04:55<05:07, 1.29it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 387/782 [04:55<05:08, 1.28it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 388/782 [04:56<05:04, 1.29it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 389/782 [04:57<05:03, 1.30it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 390/782 [04:58<05:00, 1.30it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 391/782 [04:58<04:58, 1.31it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 392/782 [04:59<04:59, 1.30it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 393/782 [05:00<05:00, 1.29it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 394/782 [05:01<04:57, 1.30it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 395/782 [05:01<04:57, 1.30it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 396/782 [05:02<04:53, 1.32it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 397/782 [05:03<04:55, 1.30it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 398/782 [05:04<04:55, 1.30it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 399/782 [05:04<04:55, 1.30it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 400/782 [05:05<04:57, 1.29it/s] {'loss': 0.0003, 'grad_norm': 0.037841796875, 'learning_rate': 9.795396419437341e-06, 'epoch': 0.51}
51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 400/782 [05:05<04:57, 1.29it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 401/782 [05:06<04:52, 1.30it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 402/782 [05:07<04:50, 1.31it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 403/782 [05:08<04:52, 1.29it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 404/782 [05:08<04:51, 1.30it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 405/782 [05:09<04:51, 1.29it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 406/782 [05:10<04:49, 1.30it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 407/782 [05:11<04:48, 1.30it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 408/782 [05:11<04:47, 1.30it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 409/782 [05:12<04:35, 1.35it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 410/782 [05:13<04:37, 1.34it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 411/782 [05:14<04:41, 1.32it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 412/782 [05:14<04:40, 1.32it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 413/782 [05:15<04:38, 1.33it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 414/782 [05:16<04:39, 1.32it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 415/782 [05:17<04:37, 1.32it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 416/782 [05:17<04:38, 1.31it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 417/782 [05:18<04:37, 1.31it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 418/782 [05:19<04:37, 1.31it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 419/782 [05:20<04:39, 1.30it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 420/782 [05:21<04:39, 1.30it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 421/782 [05:21<04:35, 1.31it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 422/782 [05:22<04:40, 1.29it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 423/782 [05:23<04:35, 1.30it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 424/782 [05:24<04:37, 1.29it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 425/782 [05:24<04:36, 1.29it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 426/782 [05:25<04:36, 1.29it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 427/782 [05:26<04:34, 1.29it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 428/782 [05:27<04:31, 1.30it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 429/782 [05:27<04:29, 1.31it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 430/782 [05:28<04:25, 1.32it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 431/782 [05:29<04:24, 1.33it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 432/782 [05:30<04:23, 1.33it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 433/782 [05:30<04:24, 1.32it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 434/782 [05:31<04:27, 1.30it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 435/782 [05:32<04:26, 1.30it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 436/782 [05:33<04:22, 1.32it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 437/782 [05:34<04:20, 1.32it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 438/782 [05:34<04:17, 1.34it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 439/782 [05:35<04:17, 1.33it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 440/782 [05:36<04:15, 1.34it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 441/782 [05:37<04:18, 1.32it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 442/782 [05:37<04:20, 1.31it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 443/782 [05:38<04:18, 1.31it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 444/782 [05:39<04:18, 1.31it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 445/782 [05:40<04:20, 1.29it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 446/782 [05:40<04:16, 1.31it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 447/782 [05:41<04:17, 1.30it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 448/782 [05:42<04:16, 1.30it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 449/782 [05:43<04:11, 1.32it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 450/782 [05:43<04:11, 1.32it/s] {'loss': 0.0002, 'grad_norm': 0.0400390625, 'learning_rate': 8.516624040920717e-06, 'epoch': 0.58}
58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 450/782 [05:43<04:11, 1.32it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 451/782 [05:44<04:12, 1.31it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 452/782 [05:45<04:11, 1.31it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 453/782 [05:46<04:08, 1.32it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 454/782 [05:46<04:07, 1.33it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 455/782 [05:47<04:09, 1.31it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 456/782 [05:48<04:08, 1.31it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 457/782 [05:49<04:08, 1.31it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 458/782 [05:49<04:02, 1.33it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 459/782 [05:50<04:03, 1.33it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 460/782 [05:51<04:05, 1.31it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 461/782 [05:52<03:59, 1.34it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 462/782 [05:52<04:02, 1.32it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 463/782 [05:53<03:59, 1.33it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 464/782 [05:54<04:00, 1.32it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 465/782 [05:55<04:02, 1.31it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 466/782 [05:56<04:00, 1.32it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 467/782 [05:56<04:01, 1.31it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 468/782 [05:57<03:57, 1.32it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 469/782 [05:58<03:58, 1.31it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 470/782 [05:59<03:57, 1.31it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 471/782 [05:59<03:54, 1.33it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 472/782 [06:00<03:52, 1.33it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 473/782 [06:01<03:54, 1.32it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 474/782 [06:02<03:53, 1.32it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 475/782 [06:02<03:52, 1.32it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 476/782 [06:03<03:48, 1.34it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 477/782 [06:04<03:52, 1.31it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 478/782 [06:05<03:54, 1.29it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 479/782 [06:05<03:54, 1.29it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 480/782 [06:06<03:55, 1.28it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 481/782 [06:07<03:52, 1.30it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 482/782 [06:08<03:52, 1.29it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 483/782 [06:09<03:49, 1.30it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 484/782 [06:09<03:49, 1.30it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 485/782 [06:10<03:48, 1.30it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 486/782 [06:11<03:48, 1.30it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 487/782 [06:12<03:48, 1.29it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 488/782 [06:12<03:43, 1.32it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 489/782 [06:13<03:43, 1.31it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 490/782 [06:14<03:45, 1.29it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 491/782 [06:15<03:42, 1.31it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 492/782 [06:15<03:41, 1.31it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 493/782 [06:16<03:41, 1.30it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 494/782 [06:17<03:40, 1.30it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 495/782 [06:18<03:40, 1.30it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 496/782 [06:19<03:42, 1.29it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 497/782 [06:19<03:38, 1.30it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 498/782 [06:20<03:39, 1.30it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 499/782 [06:21<03:39, 1.29it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 500/782 [06:22<03:38, 1.29it/s] {'loss': 0.0002, 'grad_norm': 0.050537109375, 'learning_rate': 7.237851662404093e-06, 'epoch': 0.64}
64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 500/782 [06:22<03:38, 1.29it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 501/782 [06:22<03:35, 1.31it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 502/782 [06:23<03:33, 1.31it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 503/782 [06:24<03:32, 1.31it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 504/782 [06:25<03:30, 1.32it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 505/782 [06:25<03:31, 1.31it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 506/782 [06:26<03:30, 1.31it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 507/782 [06:27<03:30, 1.31it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 508/782 [06:28<03:28, 1.31it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 509/782 [06:28<03:27, 1.32it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 510/782 [06:29<03:25, 1.32it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 511/782 [06:30<03:26, 1.31it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 512/782 [06:31<03:25, 1.31it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 513/782 [06:31<03:24, 1.31it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 514/782 [06:32<03:23, 1.31it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 515/782 [06:33<03:22, 1.32it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 516/782 [06:34<03:22, 1.31it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 517/782 [06:35<03:21, 1.32it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 518/782 [06:35<03:21, 1.31it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 519/782 [06:36<03:20, 1.31it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 520/782 [06:37<03:19, 1.31it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 521/782 [06:38<03:21, 1.30it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 522/782 [06:38<03:19, 1.31it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 523/782 [06:39<03:17, 1.31it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 524/782 [06:40<03:16, 1.31it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 525/782 [06:41<03:12, 1.34it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 526/782 [06:41<03:10, 1.34it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 527/782 [06:42<03:12, 1.33it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 528/782 [06:43<03:13, 1.31it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 529/782 [06:44<03:12, 1.31it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 530/782 [06:44<03:08, 1.34it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 531/782 [06:45<03:09, 1.33it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 532/782 [06:46<03:08, 1.33it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 533/782 [06:47<03:08, 1.32it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 534/782 [06:47<03:09, 1.31it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 535/782 [06:48<03:09, 1.30it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 536/782 [06:49<03:11, 1.29it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 537/782 [06:50<03:07, 1.31it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 538/782 [06:51<03:07, 1.30it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 539/782 [06:51<03:06, 1.30it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 540/782 [06:52<03:05, 1.30it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 541/782 [06:53<03:05, 1.30it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 542/782 [06:54<03:05, 1.29it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 543/782 [06:54<03:04, 1.30it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 544/782 [06:55<03:01, 1.31it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 545/782 [06:56<02:59, 1.32it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 546/782 [06:57<03:02, 1.29it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 547/782 [06:57<03:03, 1.28it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 548/782 [06:58<03:03, 1.28it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 549/782 [06:59<03:02, 1.28it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 550/782 [07:00<02:57, 1.31it/s] {'loss': 0.0002, 'grad_norm': 0.05517578125, 'learning_rate': 5.959079283887469e-06, 'epoch': 0.7}
70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 550/782 [07:00<02:57, 1.31it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 551/782 [07:01<02:56, 1.31it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 552/782 [07:01<02:54, 1.32it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 553/782 [07:02<02:54, 1.31it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 554/782 [07:03<02:55, 1.30it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 555/782 [07:04<02:54, 1.30it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 556/782 [07:04<02:53, 1.31it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 557/782 [07:05<02:51, 1.31it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 558/782 [07:06<02:52, 1.30it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 559/782 [07:07<02:47, 1.33it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 560/782 [07:07<02:47, 1.32it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 561/782 [07:08<02:47, 1.32it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 562/782 [07:09<02:46, 1.32it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 563/782 [07:10<02:45, 1.32it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 564/782 [07:10<02:44, 1.32it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 565/782 [07:11<02:44, 1.32it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 566/782 [07:12<02:43, 1.32it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 567/782 [07:13<02:44, 1.31it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 568/782 [07:13<02:44, 1.30it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 569/782 [07:14<02:42, 1.31it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 570/782 [07:15<02:39, 1.33it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 571/782 [07:16<02:38, 1.33it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 572/782 [07:16<02:40, 1.31it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 573/782 [07:17<02:39, 1.31it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 574/782 [07:18<02:36, 1.33it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 575/782 [07:19<02:33, 1.35it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 576/782 [07:19<02:34, 1.33it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 577/782 [07:20<02:35, 1.32it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 578/782 [07:21<02:31, 1.34it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 579/782 [07:22<02:32, 1.33it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 580/782 [07:23<02:35, 1.30it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 581/782 [07:23<02:35, 1.29it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 582/782 [07:24<02:32, 1.31it/s] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 583/782 [07:25<02:32, 1.31it/s] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 584/782 [07:26<02:30, 1.31it/s] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 585/782 [07:26<02:32, 1.29it/s] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 586/782 [07:27<02:31, 1.29it/s] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 587/782 [07:28<02:28, 1.31it/s] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 588/782 [07:29<02:29, 1.30it/s] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 589/782 [07:29<02:29, 1.29it/s] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 590/782 [07:30<02:25, 1.32it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 591/782 [07:31<02:23, 1.33it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 592/782 [07:32<02:23, 1.33it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 593/782 [07:32<02:24, 1.31it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 594/782 [07:33<02:23, 1.31it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 595/782 [07:34<02:21, 1.33it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 596/782 [07:35<02:21, 1.32it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 597/782 [07:35<02:20, 1.32it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 598/782 [07:36<02:20, 1.31it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 599/782 [07:37<02:21, 1.29it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 600/782 [07:38<02:17, 1.32it/s] {'loss': 0.0002, 'grad_norm': 0.0390625, 'learning_rate': 4.6803069053708444e-06, 'epoch': 0.77}
77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 600/782 [07:38<02:17, 1.32it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 601/782 [07:39<02:16, 1.32it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 602/782 [07:39<02:18, 1.30it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 603/782 [07:40<02:18, 1.29it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 604/782 [07:41<02:17, 1.29it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 605/782 [07:42<02:14, 1.32it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 606/782 [07:42<02:14, 1.31it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 607/782 [07:43<02:13, 1.31it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 608/782 [07:44<02:11, 1.32it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 609/782 [07:45<02:12, 1.30it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 610/782 [07:45<02:12, 1.29it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 611/782 [07:46<02:11, 1.30it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 612/782 [07:47<02:11, 1.30it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 613/782 [07:48<02:10, 1.30it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 614/782 [07:49<02:09, 1.30it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 615/782 [07:49<02:06, 1.32it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 616/782 [07:50<02:06, 1.31it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 617/782 [07:51<02:06, 1.31it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 618/782 [07:52<02:04, 1.31it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 619/782 [07:52<02:03, 1.32it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 620/782 [07:53<02:05, 1.29it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 621/782 [07:54<02:03, 1.30it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 622/782 [07:55<02:04, 1.29it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 623/782 [07:55<02:04, 1.28it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 624/782 [07:56<02:02, 1.29it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 625/782 [07:57<02:01, 1.29it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 626/782 [07:58<01:59, 1.30it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 627/782 [07:58<01:58, 1.31it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 628/782 [07:59<01:57, 1.31it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 629/782 [08:00<01:55, 1.32it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 630/782 [08:01<01:56, 1.31it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 631/782 [08:02<01:55, 1.30it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 632/782 [08:02<01:55, 1.30it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 633/782 [08:03<01:54, 1.30it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 634/782 [08:04<01:53, 1.30it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 635/782 [08:05<01:51, 1.32it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 636/782 [08:05<01:49, 1.33it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 637/782 [08:06<01:48, 1.34it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 638/782 [08:07<01:48, 1.33it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 639/782 [08:08<01:46, 1.35it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 640/782 [08:08<01:47, 1.32it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 641/782 [08:09<01:47, 1.31it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 642/782 [08:10<01:47, 1.31it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 643/782 [08:11<01:48, 1.29it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 644/782 [08:11<01:46, 1.30it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 645/782 [08:12<01:45, 1.30it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 646/782 [08:13<01:44, 1.30it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 647/782 [08:14<01:44, 1.30it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 648/782 [08:15<01:43, 1.30it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 649/782 [08:15<01:40, 1.33it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 650/782 [08:16<01:41, 1.30it/s] {'loss': 0.0002, 'grad_norm': 0.035888671875, 'learning_rate': 3.4015345268542205e-06, 'epoch': 0.83}
83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 650/782 [08:16<01:41, 1.30it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 651/782 [08:17<01:40, 1.30it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 652/782 [08:18<01:39, 1.31it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 653/782 [08:18<01:38, 1.31it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 654/782 [08:19<01:35, 1.34it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 655/782 [08:20<01:35, 1.33it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 656/782 [08:21<01:33, 1.34it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 657/782 [08:21<01:34, 1.33it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 658/782 [08:22<01:32, 1.35it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 659/782 [08:23<01:32, 1.33it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 660/782 [08:24<01:32, 1.33it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 661/782 [08:24<01:31, 1.32it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 662/782 [08:25<01:31, 1.31it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 663/782 [08:26<01:31, 1.31it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 664/782 [08:27<01:30, 1.31it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 665/782 [08:27<01:30, 1.29it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 666/782 [08:28<01:29, 1.30it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 667/782 [08:29<01:27, 1.31it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 668/782 [08:30<01:25, 1.34it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 669/782 [08:30<01:24, 1.33it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 670/782 [08:31<01:24, 1.33it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 671/782 [08:32<01:22, 1.34it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 672/782 [08:33<01:23, 1.32it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 673/782 [08:33<01:22, 1.32it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 674/782 [08:34<01:22, 1.32it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 675/782 [08:35<01:20, 1.33it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 676/782 [08:36<01:19, 1.33it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 677/782 [08:36<01:20, 1.31it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 678/782 [08:37<01:20, 1.29it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 679/782 [08:38<01:20, 1.29it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 680/782 [08:39<01:19, 1.28it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 681/782 [08:40<01:17, 1.31it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 682/782 [08:40<01:16, 1.31it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 683/782 [08:41<01:15, 1.31it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 684/782 [08:42<01:15, 1.29it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 685/782 [08:43<01:14, 1.30it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 686/782 [08:43<01:15, 1.28it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 687/782 [08:44<01:13, 1.29it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 688/782 [08:45<01:12, 1.29it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 689/782 [08:46<01:10, 1.31it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 690/782 [08:46<01:10, 1.31it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 691/782 [08:47<01:10, 1.30it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 692/782 [08:48<01:09, 1.30it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 693/782 [08:49<01:08, 1.30it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 694/782 [08:50<01:17, 1.13it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 695/782 [08:51<01:13, 1.19it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 696/782 [08:51<01:10, 1.22it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 697/782 [08:52<01:08, 1.25it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 698/782 [08:53<01:07, 1.25it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 699/782 [08:54<01:04, 1.28it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 700/782 [08:55<01:04, 1.28it/s] {'loss': 0.0002, 'grad_norm': 0.055908203125, 'learning_rate': 2.122762148337596e-06, 'epoch': 0.9}
90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 700/782 [08:55<01:04, 1.28it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 701/782 [08:55<01:02, 1.29it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 702/782 [08:56<01:00, 1.31it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 703/782 [08:57<01:00, 1.31it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 704/782 [08:58<00:59, 1.31it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 705/782 [08:58<00:59, 1.29it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 706/782 [08:59<00:58, 1.30it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 707/782 [09:00<00:56, 1.32it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 708/782 [09:01<00:56, 1.30it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 709/782 [09:01<00:55, 1.31it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 710/782 [09:02<00:55, 1.30it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 711/782 [09:03<00:54, 1.29it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 712/782 [09:04<00:54, 1.29it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 713/782 [09:05<00:53, 1.30it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 714/782 [09:05<00:51, 1.32it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 715/782 [09:06<00:51, 1.31it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 716/782 [09:07<00:49, 1.33it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 717/782 [09:08<00:49, 1.32it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 718/782 [09:08<00:48, 1.32it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 719/782 [09:09<00:48, 1.31it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 720/782 [09:10<00:47, 1.32it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 721/782 [09:11<00:46, 1.32it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 722/782 [09:11<00:45, 1.33it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 723/782 [09:12<00:43, 1.34it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 724/782 [09:13<00:44, 1.32it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 725/782 [09:14<00:43, 1.30it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 726/782 [09:14<00:42, 1.31it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 727/782 [09:15<00:42, 1.30it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 728/782 [09:16<00:41, 1.29it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 729/782 [09:17<00:41, 1.28it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 730/782 [09:17<00:39, 1.31it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 731/782 [09:18<00:38, 1.31it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 732/782 [09:19<00:37, 1.34it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 733/782 [09:20<00:36, 1.33it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 734/782 [09:20<00:36, 1.33it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 735/782 [09:21<00:35, 1.34it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 736/782 [09:22<00:34, 1.33it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 737/782 [09:23<00:33, 1.34it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 738/782 [09:23<00:33, 1.33it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 739/782 [09:24<00:31, 1.35it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 740/782 [09:25<00:31, 1.35it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 741/782 [09:26<00:30, 1.35it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 742/782 [09:26<00:29, 1.36it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 743/782 [09:27<00:29, 1.33it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 744/782 [09:28<00:28, 1.33it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 745/782 [09:29<00:28, 1.31it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 746/782 [09:29<00:27, 1.30it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 747/782 [09:30<00:26, 1.30it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 748/782 [09:31<00:25, 1.31it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 749/782 [09:32<00:25, 1.32it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 750/782 [09:32<00:24, 1.31it/s] {'loss': 0.0002, 'grad_norm': 0.04638671875, 'learning_rate': 8.439897698209719e-07, 'epoch': 0.96}
96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 750/782 [09:33<00:24, 1.31it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 751/782 [09:33<00:23, 1.31it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 752/782 [09:34<00:22, 1.31it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 753/782 [09:35<00:21, 1.32it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 754/782 [09:36<00:21, 1.31it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 755/782 [09:36<00:20, 1.32it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 756/782 [09:37<00:19, 1.31it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 757/782 [09:38<00:19, 1.31it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 758/782 [09:39<00:18, 1.31it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 759/782 [09:39<00:17, 1.31it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 760/782 [09:40<00:16, 1.31it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 761/782 [09:41<00:15, 1.32it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 762/782 [09:42<00:15, 1.30it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 763/782 [09:42<00:14, 1.31it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 764/782 [09:43<00:13, 1.30it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 765/782 [09:44<00:13, 1.30it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 766/782 [09:45<00:12, 1.30it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 767/782 [09:45<00:11, 1.31it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 768/782 [09:46<00:10, 1.29it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 769/782 [09:47<00:10, 1.28it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 770/782 [09:48<00:09, 1.28it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 771/782 [09:49<00:08, 1.29it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 772/782 [09:49<00:07, 1.29it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 773/782 [09:50<00:06, 1.32it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 774/782 [09:51<00:06, 1.30it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 775/782 [09:52<00:05, 1.30it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 776/782 [09:52<00:04, 1.31it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 777/782 [09:53<00:03, 1.31it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 778/782 [09:54<00:03, 1.30it/s] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 779/782 [09:55<00:02, 1.30it/s] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 780/782 [09:55<00:01, 1.31it/s] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 781/782 [09:56<00:00, 1.30it/s] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 782/782 [09:57<00:00, 1.31it/s] {'train_runtime': 598.6221, 'train_samples_per_second': 167.05, 'train_steps_per_second': 1.306, 'train_loss': 0.20923083342607027, 'epoch': 1.0}
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 782/782 [09:57<00:00, 1.31it/s] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 782/782 [09:57<00:00, 1.31it/s]
model.safetensors: 0%| | 0.00/2.00G [00:00<?, ?B/s]
tokenizer.model: 0%| | 0.00/4.69M [00:00<?, ?B/s]
Upload 3 LFS files: 0%| | 0/3 [00:00<?, ?it/s]
training_args.bin: 0%| | 0.00/5.43k [00:00<?, ?B/s] training_args.bin: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 5.43k/5.43k [00:00<00:00, 69.7kB/s]
model.safetensors: 0%| | 2.56M/2.00G [00:00<01:18, 25.5MB/s]
tokenizer.model: 6%|β–Œ | 279k/4.69M [00:00<00:01, 2.73MB/s] model.safetensors: 1%| | 14.6M/2.00G [00:00<00:24, 81.3MB/s] tokenizer.model: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 4.69M/4.69M [00:00<00:00, 14.0MB/s]
model.safetensors: 1%| | 22.7M/2.00G [00:00<00:41, 48.1MB/s] model.safetensors: 2%|▏ | 32.0M/2.00G [00:00<00:45, 43.1MB/s] model.safetensors: 2%|▏ | 48.0M/2.00G [00:01<01:05, 29.8MB/s] model.safetensors: 3%|β–Ž | 64.0M/2.00G [00:01<00:51, 37.3MB/s] model.safetensors: 4%|▍ | 78.1M/2.00G [00:01<00:38, 49.5MB/s] model.safetensors: 4%|▍ | 85.4M/2.00G [00:02<00:43, 44.5MB/s] model.safetensors: 5%|▍ | 96.0M/2.00G [00:02<00:44, 43.2MB/s] model.safetensors: 6%|β–Œ | 110M/2.00G [00:02<00:32, 57.5MB/s] model.safetensors: 6%|β–Œ | 118M/2.00G [00:02<00:38, 48.3MB/s] model.safetensors: 6%|β–‹ | 128M/2.00G [00:02<00:42, 43.7MB/s] model.safetensors: 7%|β–‹ | 142M/2.00G [00:03<00:32, 57.3MB/s] model.safetensors: 7%|β–‹ | 150M/2.00G [00:03<00:54, 33.9MB/s] model.safetensors: 8%|β–Š | 160M/2.00G [00:03<00:53, 34.6MB/s] model.safetensors: 9%|β–Š | 174M/2.00G [00:03<00:38, 47.4MB/s] model.safetensors: 9%|β–‰ | 181M/2.00G [00:04<00:40, 44.4MB/s] model.safetensors: 10%|β–‰ | 192M/2.00G [00:04<00:40, 44.9MB/s] model.safetensors: 10%|β–ˆ | 208M/2.00G [00:04<00:37, 47.8MB/s] model.safetensors: 11%|β–ˆ | 222M/2.00G [00:04<00:29, 60.5MB/s] model.safetensors: 11%|β–ˆβ– | 230M/2.00G [00:05<00:34, 51.4MB/s] model.safetensors: 12%|β–ˆβ– | 240M/2.00G [00:05<00:38, 46.1MB/s] model.safetensors: 13%|β–ˆβ–Ž | 254M/2.00G [00:05<00:28, 61.2MB/s] model.safetensors: 13%|β–ˆβ–Ž | 263M/2.00G [00:05<00:33, 51.2MB/s] model.safetensors: 14%|β–ˆβ–Ž | 272M/2.00G [00:05<00:37, 45.8MB/s] model.safetensors: 14%|β–ˆβ– | 285M/2.00G [00:06<00:28, 59.7MB/s] model.safetensors: 15%|β–ˆβ– | 293M/2.00G [00:06<00:35, 47.7MB/s] model.safetensors: 15%|β–ˆβ–Œ | 304M/2.00G [00:06<00:35, 48.0MB/s] model.safetensors: 16%|β–ˆβ–Œ | 320M/2.00G [00:06<00:32, 52.3MB/s] model.safetensors: 17%|β–ˆβ–‹ | 334M/2.00G [00:06<00:25, 65.7MB/s] model.safetensors: 17%|β–ˆβ–‹ | 343M/2.00G [00:07<00:30, 53.7MB/s] model.safetensors: 18%|β–ˆβ–Š | 352M/2.00G [00:07<00:33, 49.6MB/s] model.safetensors: 18%|β–ˆβ–Š | 368M/2.00G [00:07<00:24, 67.4MB/s] model.safetensors: 19%|β–ˆβ–‰ | 377M/2.00G [00:07<00:31, 50.9MB/s] model.safetensors: 19%|β–ˆβ–‰ | 384M/2.00G [00:08<00:36, 44.4MB/s] model.safetensors: 20%|β–ˆβ–ˆ | 400M/2.00G [00:08<00:32, 50.0MB/s] model.safetensors: 21%|β–ˆβ–ˆ | 413M/2.00G [00:08<00:26, 60.7MB/s] model.safetensors: 21%|β–ˆβ–ˆ | 421M/2.00G [00:08<00:32, 49.3MB/s] model.safetensors: 22%|β–ˆβ–ˆβ– | 432M/2.00G [00:08<00:32, 48.2MB/s] model.safetensors: 22%|β–ˆβ–ˆβ– | 447M/2.00G [00:09<00:23, 65.1MB/s] model.safetensors: 23%|β–ˆβ–ˆβ–Ž | 456M/2.00G [00:09<00:27, 55.5MB/s] model.safetensors: 23%|β–ˆβ–ˆβ–Ž | 464M/2.00G [00:09<00:31, 49.4MB/s] model.safetensors: 24%|β–ˆβ–ˆβ– | 480M/2.00G [00:09<00:29, 51.3MB/s] model.safetensors: 25%|β–ˆβ–ˆβ– | 496M/2.00G [00:10<00:28, 53.4MB/s] model.safetensors: 26%|β–ˆβ–ˆβ–Œ | 512M/2.00G [00:10<00:28, 52.0MB/s] model.safetensors: 26%|β–ˆβ–ˆβ–‹ | 528M/2.00G [00:10<00:26, 55.2MB/s] model.safetensors: 27%|β–ˆβ–ˆβ–‹ | 544M/2.00G [00:10<00:25, 57.0MB/s] model.safetensors: 28%|β–ˆβ–ˆβ–Š | 560M/2.00G [00:11<00:33, 43.2MB/s] model.safetensors: 29%|β–ˆβ–ˆβ–‰ | 576M/2.00G [00:11<00:29, 47.6MB/s] model.safetensors: 30%|β–ˆβ–ˆβ–‰ | 592M/2.00G [00:11<00:27, 51.1MB/s] model.safetensors: 30%|β–ˆβ–ˆβ–ˆ | 608M/2.00G [00:12<00:28, 48.2MB/s] model.safetensors: 31%|β–ˆβ–ˆβ–ˆ | 624M/2.00G [00:12<00:27, 49.8MB/s] model.safetensors: 32%|β–ˆβ–ˆβ–ˆβ– | 640M/2.00G [00:12<00:27, 49.1MB/s] model.safetensors: 33%|β–ˆβ–ˆβ–ˆβ–Ž | 655M/2.00G [00:13<00:22, 60.4MB/s] model.safetensors: 33%|β–ˆβ–ˆβ–ˆβ–Ž | 663M/2.00G [00:13<00:25, 52.1MB/s] model.safetensors: 34%|β–ˆβ–ˆβ–ˆβ–Ž | 672M/2.00G [00:13<00:26, 50.3MB/s] model.safetensors: 34%|β–ˆβ–ˆβ–ˆβ– | 688M/2.00G [00:13<00:19, 66.6MB/s] model.safetensors: 35%|β–ˆβ–ˆβ–ˆβ– | 697M/2.00G [00:13<00:23, 54.7MB/s] model.safetensors: 35%|β–ˆβ–ˆβ–ˆβ–Œ | 704M/2.00G [00:14<00:29, 43.9MB/s] model.safetensors: 36%|β–ˆβ–ˆβ–ˆβ–Œ | 720M/2.00G [00:14<00:27, 46.4MB/s] model.safetensors: 37%|β–ˆβ–ˆβ–ˆβ–‹ | 736M/2.00G [00:14<00:25, 48.8MB/s] model.safetensors: 38%|β–ˆβ–ˆβ–ˆβ–Š | 752M/2.00G [00:15<00:26, 47.3MB/s] model.safetensors: 38%|β–ˆβ–ˆβ–ˆβ–Š | 768M/2.00G [00:15<00:26, 47.0MB/s] model.safetensors: 39%|β–ˆβ–ˆβ–ˆβ–‰ | 784M/2.00G [00:15<00:24, 49.9MB/s] model.safetensors: 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 800M/2.00G [00:16<00:27, 43.9MB/s] model.safetensors: 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 816M/2.00G [00:16<00:24, 48.1MB/s] model.safetensors: 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 832M/2.00G [00:16<00:23, 49.4MB/s] model.safetensors: 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 848M/2.00G [00:17<00:21, 53.5MB/s] model.safetensors: 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 864M/2.00G [00:17<00:24, 46.9MB/s] model.safetensors: 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 880M/2.00G [00:17<00:22, 50.3MB/s] model.safetensors: 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 896M/2.00G [00:18<00:20, 52.9MB/s] model.safetensors: 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 912M/2.00G [00:18<00:19, 55.2MB/s] model.safetensors: 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 928M/2.00G [00:18<00:23, 46.0MB/s] model.safetensors: 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 944M/2.00G [00:19<00:21, 49.7MB/s] model.safetensors: 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 956M/2.00G [00:19<00:17, 58.2MB/s] model.safetensors: 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 964M/2.00G [00:19<00:19, 52.4MB/s] model.safetensors: 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 976M/2.00G [00:19<00:20, 49.7MB/s] model.safetensors: 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 992M/2.00G [00:19<00:20, 48.8MB/s] model.safetensors: 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1.01G/2.00G [00:20<00:19, 51.0MB/s] model.safetensors: 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1.02G/2.00G [00:20<00:18, 52.7MB/s] model.safetensors: 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1.04G/2.00G [00:20<00:17, 55.8MB/s] model.safetensors: 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1.06G/2.00G [00:21<00:16, 56.9MB/s] model.safetensors: 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1.07G/2.00G [00:21<00:13, 70.5MB/s] model.safetensors: 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1.08G/2.00G [00:21<00:16, 55.9MB/s] model.safetensors: 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1.09G/2.00G [00:21<00:18, 48.7MB/s] model.safetensors: 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1.10G/2.00G [00:21<00:15, 56.3MB/s] model.safetensors: 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1.12G/2.00G [00:22<00:14, 58.8MB/s] model.safetensors: 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1.14G/2.00G [00:22<00:15, 56.9MB/s] model.safetensors: 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1.15G/2.00G [00:22<00:12, 70.1MB/s] model.safetensors: 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1.16G/2.00G [00:22<00:14, 59.6MB/s] model.safetensors: 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1.17G/2.00G [00:23<00:16, 50.8MB/s] model.safetensors: 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1.18G/2.00G [00:23<00:15, 51.1MB/s] model.safetensors: 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1.20G/2.00G [00:23<00:14, 54.0MB/s] model.safetensors: 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1.21G/2.00G [00:23<00:11, 67.4MB/s] model.safetensors: 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1.22G/2.00G [00:23<00:13, 57.1MB/s] model.safetensors: 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1.23G/2.00G [00:24<00:14, 52.4MB/s] model.safetensors: 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1.25G/2.00G [00:24<00:13, 57.7MB/s] model.safetensors: 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1.26G/2.00G [00:24<00:12, 60.6MB/s] model.safetensors: 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1.28G/2.00G [00:24<00:09, 73.8MB/s] model.safetensors: 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1.29G/2.00G [00:24<00:11, 59.5MB/s] model.safetensors: 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1.30G/2.00G [00:25<00:13, 52.9MB/s] model.safetensors: 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1.31G/2.00G [00:25<00:09, 69.1MB/s] model.safetensors: 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1.32G/2.00G [00:25<00:12, 55.8MB/s] model.safetensors: 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1.33G/2.00G [00:25<00:14, 46.1MB/s] model.safetensors: 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1.34G/2.00G [00:26<00:12, 51.5MB/s] model.safetensors: 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1.36G/2.00G [00:26<00:11, 54.7MB/s] model.safetensors: 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1.38G/2.00G [00:26<00:11, 55.0MB/s] model.safetensors: 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1.39G/2.00G [00:26<00:10, 57.8MB/s] model.safetensors: 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1.41G/2.00G [00:27<00:10, 57.6MB/s] model.safetensors: 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1.42G/2.00G [00:27<00:10, 57.5MB/s] model.safetensors: 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1.44G/2.00G [00:27<00:09, 59.1MB/s] model.safetensors: 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1.46G/2.00G [00:27<00:08, 61.0MB/s] model.safetensors: 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1.47G/2.00G [00:28<00:08, 59.0MB/s] model.safetensors: 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1.49G/2.00G [00:28<00:08, 59.4MB/s] model.safetensors: 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1.50G/2.00G [00:28<00:08, 57.6MB/s] model.safetensors: 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1.52G/2.00G [00:29<00:08, 56.4MB/s] model.safetensors: 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1.54G/2.00G [00:29<00:07, 58.0MB/s] model.safetensors: 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1.55G/2.00G [00:29<00:07, 59.0MB/s] model.safetensors: 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1.57G/2.00G [00:29<00:07, 60.2MB/s] model.safetensors: 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1.58G/2.00G [00:30<00:07, 56.5MB/s] model.safetensors: 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1.60G/2.00G [00:30<00:06, 57.6MB/s] model.safetensors: 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1.62G/2.00G [00:30<00:06, 58.6MB/s] model.safetensors: 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1.63G/2.00G [00:30<00:06, 59.4MB/s] model.safetensors: 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1.65G/2.00G [00:31<00:05, 60.1MB/s] model.safetensors: 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1.66G/2.00G [00:31<00:05, 62.9MB/s] model.safetensors: 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1.68G/2.00G [00:31<00:05, 58.1MB/s] model.safetensors: 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1.70G/2.00G [00:31<00:04, 71.0MB/s] model.safetensors: 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1.70G/2.00G [00:32<00:05, 57.7MB/s] model.safetensors: 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1.71G/2.00G [00:32<00:06, 44.1MB/s] model.safetensors: 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1.73G/2.00G [00:32<00:05, 47.6MB/s] model.safetensors: 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1.74G/2.00G [00:33<00:05, 50.1MB/s] model.safetensors: 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1.76G/2.00G [00:33<00:04, 53.7MB/s] model.safetensors: 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1.78G/2.00G [00:34<00:05, 37.5MB/s] model.safetensors: 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1.79G/2.00G [00:34<00:04, 41.7MB/s] model.safetensors: 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1.81G/2.00G [00:34<00:04, 47.7MB/s] model.safetensors: 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1.82G/2.00G [00:34<00:03, 58.3MB/s] model.safetensors: 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1.83G/2.00G [00:34<00:03, 48.4MB/s] model.safetensors: 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1.84G/2.00G [00:35<00:03, 47.0MB/s] model.safetensors: 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1.86G/2.00G [00:35<00:02, 52.8MB/s] model.safetensors: 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1.87G/2.00G [00:35<00:02, 53.9MB/s] model.safetensors: 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1.89G/2.00G [00:35<00:01, 56.8MB/s] model.safetensors: 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1.90G/2.00G [00:36<00:01, 53.3MB/s] model.safetensors: 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1.92G/2.00G [00:36<00:01, 54.4MB/s] model.safetensors: 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1.94G/2.00G [00:36<00:01, 58.0MB/s] model.safetensors: 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1.95G/2.00G [00:37<00:00, 57.8MB/s] model.safetensors: 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1.97G/2.00G [00:37<00:00, 62.3MB/s] model.safetensors: 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1.98G/2.00G [00:37<00:00, 61.9MB/s] model.safetensors: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2.00G/2.00G [00:37<00:00, 52.8MB/s]
Upload 3 LFS files: 33%|β–ˆβ–ˆβ–ˆβ–Ž | 1/3 [00:38<01:16, 38.06s/it] Upload 3 LFS files: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 3/3 [00:38<00:00, 12.69s/it]