2025-04-02 09:01:09,892 - INFO: Problem Type: text_causal_language_modeling 2025-04-02 09:01:09,893 - INFO: Global random seed: 742339 2025-04-02 09:01:09,893 - INFO: Preparing the data... 2025-04-02 09:01:09,893 - INFO: Setting up automatic validation split... 2025-04-02 09:01:10,002 - INFO: Preparing train and validation data 2025-04-02 09:01:10,003 - INFO: Loading train dataset... 2025-04-02 09:01:11,291 - INFO: Stop token ids: [tensor([ 27, 91, 9125, 91, 29]), tensor([ 27, 91, 41681, 91, 29]), tensor([ 27, 91, 9399, 91, 29])] 2025-04-02 09:01:11,318 - INFO: Loading validation dataset... 2025-04-02 09:01:12,347 - INFO: Stop token ids: [tensor([ 27, 91, 9125, 91, 29]), tensor([ 27, 91, 41681, 91, 29]), tensor([ 27, 91, 9399, 91, 29])] 2025-04-02 09:01:12,364 - INFO: Number of observations in train dataset: 536 2025-04-02 09:01:12,364 - INFO: Number of observations in validation dataset: 11 2025-04-02 09:01:13,822 - INFO: Stop token ids: [tensor([ 27, 91, 9125, 91, 29], device='cuda:0'), tensor([ 27, 91, 41681, 91, 29], device='cuda:0'), tensor([ 27, 91, 9399, 91, 29], device='cuda:0')] 2025-04-02 09:01:13,841 - WARNING: EOS token id not matching between config and tokenizer. Overwriting [128001, 128008, 128009] with tokenizer id 128009. 2025-04-02 09:01:13,841 - WARNING: PAD token id not matching between config and tokenizer. Overwriting None with tokenizer id 128009. 2025-04-02 09:01:13,841 - INFO: Setting pretraining_tp of model config to 1. 2025-04-02 09:01:13,862 - INFO: Using float16 for backbone 2025-04-02 09:01:13,862 - INFO: Loading meta-llama/Llama-3.1-8B-Instruct. This may take a while. 2025-04-02 09:01:18,516 - INFO: Loaded meta-llama/Llama-3.1-8B-Instruct. 2025-04-02 09:01:18,516 - INFO: Attention implementation: sdpa 2025-04-02 09:01:18,518 - WARNING: EOS token id not matching between generation config and tokenizer. Overwriting with tokenizer id. 2025-04-02 09:01:18,518 - WARNING: PAD token id not matching between generation config and tokenizer. Overwriting with tokenizer id. 2025-04-02 09:01:18,518 - INFO: Lora module names: ['q_proj', 'k_proj', 'v_proj', 'o_proj', 'gate_proj', 'up_proj', 'down_proj'] 2025-04-02 09:01:18,844 - INFO: Trainable parameters count: 43319296 2025-04-02 09:01:18,844 - INFO: Total parameters count: 8073580544 2025-04-02 09:01:18,844 - INFO: Trainable %: 0.5366% 2025-04-02 09:01:18,856 - INFO: Enough space available for saving model weights.Required space: 15946.21MB, Available space: 27997.20MB. 2025-04-02 09:01:19,049 - INFO: Training Epoch: 1 / 15 2025-04-02 09:01:19,049 - INFO: train loss: 0%| | 0/67 [00:00 52.462 to /h2o-llmstudio/output/user/canberra.1/ 2025-04-02 09:30:09,860 - INFO: train loss: 0.19: 100%|##########| 67/67 [28:50<00:00, 25.83s/it] 2025-04-02 09:30:09,870 - INFO: Training Epoch: 2 / 15 2025-04-02 09:30:09,870 - INFO: train loss: 0%| | 0/67 [00:00 54.055 to /h2o-llmstudio/output/user/canberra.1/ 2025-04-02 09:59:33,781 - INFO: train loss: 0.17: 100%|##########| 67/67 [29:23<00:00, 26.33s/it] 2025-04-02 09:59:33,802 - INFO: Training Epoch: 3 / 15 2025-04-02 09:59:33,802 - INFO: train loss: 0%| | 0/67 [00:00 59.058 to /h2o-llmstudio/output/user/canberra.1/ 2025-04-02 10:27:47,177 - INFO: train loss: 0.19: 100%|##########| 67/67 [28:13<00:00, 25.27s/it] 2025-04-02 10:27:47,187 - INFO: Training Epoch: 4 / 15 2025-04-02 10:27:47,187 - INFO: train loss: 0%| | 0/67 [00:00 59.453 to /h2o-llmstudio/output/user/canberra.1/ 2025-04-02 11:25:56,804 - INFO: train loss: 0.13: 100%|##########| 67/67 [29:34<00:00, 26.49s/it] 2025-04-02 11:25:56,813 - INFO: Training Epoch: 6 / 15 2025-04-02 11:25:56,813 - INFO: train loss: 0%| | 0/67 [00:00 60.235 to /h2o-llmstudio/output/user/canberra.1/ 2025-04-02 12:24:36,583 - INFO: train loss: 0.14: 100%|##########| 67/67 [29:40<00:00, 26.57s/it] 2025-04-02 12:24:36,595 - INFO: Training Epoch: 8 / 15 2025-04-02 12:24:36,596 - INFO: train loss: 0%| | 0/67 [00:00