[INFO|2025-04-22 23:31:16] tokenization_utils_base.py:2060 >> loading file tokenizer.model from cache at None [INFO|2025-04-22 23:31:16] tokenization_utils_base.py:2060 >> loading file tokenizer.json from cache at /root/.cache/huggingface/hub/models--deepseek-ai--deepseek-llm-7b-base/snapshots/7683fea62db869066ddaff6a41d032262c490d4f/tokenizer.json [INFO|2025-04-22 23:31:16] tokenization_utils_base.py:2060 >> loading file added_tokens.json from cache at None [INFO|2025-04-22 23:31:16] tokenization_utils_base.py:2060 >> loading file special_tokens_map.json from cache at None [INFO|2025-04-22 23:31:16] tokenization_utils_base.py:2060 >> loading file tokenizer_config.json from cache at /root/.cache/huggingface/hub/models--deepseek-ai--deepseek-llm-7b-base/snapshots/7683fea62db869066ddaff6a41d032262c490d4f/tokenizer_config.json [INFO|2025-04-22 23:31:16] tokenization_utils_base.py:2060 >> loading file chat_template.jinja from cache at None [INFO|2025-04-22 23:31:17] tokenization_utils_base.py:2323 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [INFO|2025-04-22 23:31:17] configuration_utils.py:693 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--deepseek-ai--deepseek-llm-7b-base/snapshots/7683fea62db869066ddaff6a41d032262c490d4f/config.json [INFO|2025-04-22 23:31:17] configuration_utils.py:765 >> Model config LlamaConfig { "architectures": [ "LlamaForCausalLM" ], "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 1, "eos_token_id": 2, "head_dim": 128, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 11008, "max_position_embeddings": 4096, "mlp_bias": false, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 30, "num_key_value_heads": 32, "pretraining_tp": 1, "rms_norm_eps": 1e-06, "rope_scaling": null, "rope_theta": 10000.0, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.51.3", "use_cache": true, "vocab_size": 102400 } [INFO|2025-04-22 23:31:17] tokenization_utils_base.py:2060 >> loading file tokenizer.model from cache at None [INFO|2025-04-22 23:31:17] tokenization_utils_base.py:2060 >> loading file tokenizer.json from cache at /root/.cache/huggingface/hub/models--deepseek-ai--deepseek-llm-7b-base/snapshots/7683fea62db869066ddaff6a41d032262c490d4f/tokenizer.json [INFO|2025-04-22 23:31:17] tokenization_utils_base.py:2060 >> loading file added_tokens.json from cache at None [INFO|2025-04-22 23:31:17] tokenization_utils_base.py:2060 >> loading file special_tokens_map.json from cache at None [INFO|2025-04-22 23:31:17] tokenization_utils_base.py:2060 >> loading file tokenizer_config.json from cache at /root/.cache/huggingface/hub/models--deepseek-ai--deepseek-llm-7b-base/snapshots/7683fea62db869066ddaff6a41d032262c490d4f/tokenizer_config.json [INFO|2025-04-22 23:31:17] tokenization_utils_base.py:2060 >> loading file chat_template.jinja from cache at None [INFO|2025-04-22 23:31:17] tokenization_utils_base.py:2323 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [INFO|2025-04-22 23:31:18] logging.py:143 >> Loading dataset alpaca_en_demo.json... [INFO|2025-04-22 23:31:28] configuration_utils.py:693 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--deepseek-ai--deepseek-llm-7b-base/snapshots/7683fea62db869066ddaff6a41d032262c490d4f/config.json [INFO|2025-04-22 23:31:28] configuration_utils.py:765 >> Model config LlamaConfig { "architectures": [ "LlamaForCausalLM" ], "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 1, "eos_token_id": 2, "head_dim": 128, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 11008, "max_position_embeddings": 4096, "mlp_bias": false, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 30, "num_key_value_heads": 32, "pretraining_tp": 1, "rms_norm_eps": 1e-06, "rope_scaling": null, "rope_theta": 10000.0, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.51.3", "use_cache": true, "vocab_size": 102400 } [INFO|2025-04-22 23:31:28] logging.py:143 >> Quantizing model to 4 bit with bitsandbytes. [INFO|2025-04-22 23:31:28] logging.py:143 >> KV cache is disabled during training. [INFO|2025-04-22 23:31:29] modeling_utils.py:1124 >> loading weights file pytorch_model.bin from cache at /root/.cache/huggingface/hub/models--deepseek-ai--deepseek-llm-7b-base/snapshots/7683fea62db869066ddaff6a41d032262c490d4f/pytorch_model.bin.index.json [INFO|2025-04-22 23:31:30] safetensors_conversion.py:61 >> Attempting to create safetensors variant [INFO|2025-04-22 23:31:30] safetensors_conversion.py:74 >> Safetensors PR exists [INFO|2025-04-22 23:33:11] modeling_utils.py:2167 >> Instantiating LlamaForCausalLM model under default dtype torch.bfloat16. [INFO|2025-04-22 23:33:11] configuration_utils.py:1142 >> Generate config GenerationConfig { "bos_token_id": 1, "eos_token_id": 2, "use_cache": false } [INFO|2025-04-22 23:34:25] modeling_utils.py:4930 >> All model checkpoint weights were used when initializing LlamaForCausalLM. [INFO|2025-04-22 23:34:25] modeling_utils.py:4938 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at deepseek-ai/deepseek-llm-7b-base. If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. [INFO|2025-04-22 23:34:25] configuration_utils.py:1097 >> loading configuration file generation_config.json from cache at /root/.cache/huggingface/hub/models--deepseek-ai--deepseek-llm-7b-base/snapshots/7683fea62db869066ddaff6a41d032262c490d4f/generation_config.json [INFO|2025-04-22 23:34:25] configuration_utils.py:1142 >> Generate config GenerationConfig { "bos_token_id": 100000, "eos_token_id": 100001 } [INFO|2025-04-22 23:34:25] logging.py:143 >> Gradient checkpointing enabled. [INFO|2025-04-22 23:34:25] logging.py:143 >> Using torch SDPA for faster training and inference. [INFO|2025-04-22 23:34:25] logging.py:143 >> Upcasting trainable params to float32. [INFO|2025-04-22 23:34:25] logging.py:143 >> Fine-tuning method: LoRA [INFO|2025-04-22 23:34:25] logging.py:143 >> Found linear modules: o_proj,q_proj,gate_proj,k_proj,down_proj,v_proj,up_proj [INFO|2025-04-22 23:34:26] logging.py:143 >> trainable params: 18,739,200 || all params: 6,929,104,896 || trainable%: 0.2704 [INFO|2025-04-22 23:34:26] trainer.py:748 >> Using auto half precision backend [INFO|2025-04-22 23:34:26] trainer.py:2414 >> ***** Running training ***** [INFO|2025-04-22 23:34:26] trainer.py:2415 >> Num examples = 1,307 [INFO|2025-04-22 23:34:26] trainer.py:2416 >> Num Epochs = 3 [INFO|2025-04-22 23:34:26] trainer.py:2417 >> Instantaneous batch size per device = 2 [INFO|2025-04-22 23:34:26] trainer.py:2420 >> Total train batch size (w. parallel, distributed & accumulation) = 16 [INFO|2025-04-22 23:34:26] trainer.py:2421 >> Gradient Accumulation steps = 8 [INFO|2025-04-22 23:34:26] trainer.py:2422 >> Total optimization steps = 243 [INFO|2025-04-22 23:34:26] trainer.py:2423 >> Number of trainable parameters = 18,739,200 [INFO|2025-04-22 23:36:16] logging.py:143 >> {'loss': 2.3883, 'learning_rate': 1.9987e-04, 'epoch': 0.06, 'throughput': 36.53} [INFO|2025-04-22 23:38:05] logging.py:143 >> {'loss': 1.4752, 'learning_rate': 1.9932e-04, 'epoch': 0.12, 'throughput': 36.29} [INFO|2025-04-22 23:39:57] logging.py:143 >> {'loss': 1.1174, 'learning_rate': 1.9837e-04, 'epoch': 0.18, 'throughput': 36.12} [INFO|2025-04-22 23:41:45] logging.py:143 >> {'loss': 0.7815, 'learning_rate': 1.9700e-04, 'epoch': 0.24, 'throughput': 36.03} [INFO|2025-04-22 23:43:35] logging.py:143 >> {'loss': 0.6320, 'learning_rate': 1.9522e-04, 'epoch': 0.31, 'throughput': 36.22} [INFO|2025-04-22 23:45:25] logging.py:143 >> {'loss': 0.4603, 'learning_rate': 1.9305e-04, 'epoch': 0.37, 'throughput': 36.20} [INFO|2025-04-22 23:47:17] logging.py:143 >> {'loss': 0.4761, 'learning_rate': 1.9049e-04, 'epoch': 0.43, 'throughput': 36.25} [INFO|2025-04-22 23:49:06] logging.py:143 >> {'loss': 0.4525, 'learning_rate': 1.8756e-04, 'epoch': 0.49, 'throughput': 36.22} [INFO|2025-04-22 23:50:55] logging.py:143 >> {'loss': 0.4803, 'learning_rate': 1.8425e-04, 'epoch': 0.55, 'throughput': 36.12} [INFO|2025-04-22 23:52:45] logging.py:143 >> {'loss': 0.3933, 'learning_rate': 1.8060e-04, 'epoch': 0.61, 'throughput': 36.11} [INFO|2025-04-22 23:54:39] logging.py:143 >> {'loss': 0.3919, 'learning_rate': 1.7660e-04, 'epoch': 0.67, 'throughput': 36.22} [INFO|2025-04-22 23:56:33] logging.py:143 >> {'loss': 0.3804, 'learning_rate': 1.7229e-04, 'epoch': 0.73, 'throughput': 36.31} [INFO|2025-04-22 23:58:22] logging.py:143 >> {'loss': 0.3576, 'learning_rate': 1.6768e-04, 'epoch': 0.80, 'throughput': 36.28} [INFO|2025-04-23 00:00:11] logging.py:143 >> {'loss': 0.3343, 'learning_rate': 1.6278e-04, 'epoch': 0.86, 'throughput': 36.32} [INFO|2025-04-23 00:01:59] logging.py:143 >> {'loss': 0.4526, 'learning_rate': 1.5762e-04, 'epoch': 0.92, 'throughput': 36.27} [INFO|2025-04-23 00:03:48] logging.py:143 >> {'loss': 0.4048, 'learning_rate': 1.5222e-04, 'epoch': 0.98, 'throughput': 36.24} [INFO|2025-04-23 00:05:46] logging.py:143 >> {'loss': 0.3599, 'learning_rate': 1.4660e-04, 'epoch': 1.05, 'throughput': 36.26} [INFO|2025-04-23 00:07:38] logging.py:143 >> {'loss': 0.2197, 'learning_rate': 1.4079e-04, 'epoch': 1.11, 'throughput': 36.31} [INFO|2025-04-23 00:09:28] logging.py:143 >> {'loss': 0.1816, 'learning_rate': 1.3481e-04, 'epoch': 1.17, 'throughput': 36.33} [INFO|2025-04-23 00:11:19] logging.py:143 >> {'loss': 0.1839, 'learning_rate': 1.2868e-04, 'epoch': 1.23, 'throughput': 36.33} [INFO|2025-04-23 00:11:19] trainer.py:3984 >> Saving model checkpoint to saves/DeepSeek-LLM-7B-Base/lora/train_2025-04-22-23-24-06/checkpoint-100 [INFO|2025-04-23 00:11:19] configuration_utils.py:693 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--deepseek-ai--deepseek-llm-7b-base/snapshots/7683fea62db869066ddaff6a41d032262c490d4f/config.json [INFO|2025-04-23 00:11:19] configuration_utils.py:765 >> Model config LlamaConfig { "architectures": [ "LlamaForCausalLM" ], "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 1, "eos_token_id": 2, "head_dim": 128, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 11008, "max_position_embeddings": 4096, "mlp_bias": false, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 30, "num_key_value_heads": 32, "pretraining_tp": 1, "rms_norm_eps": 1e-06, "rope_scaling": null, "rope_theta": 10000.0, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.51.3", "use_cache": true, "vocab_size": 102400 } [INFO|2025-04-23 00:11:19] tokenization_utils_base.py:2510 >> tokenizer config file saved in saves/DeepSeek-LLM-7B-Base/lora/train_2025-04-22-23-24-06/checkpoint-100/tokenizer_config.json [INFO|2025-04-23 00:11:19] tokenization_utils_base.py:2519 >> Special tokens file saved in saves/DeepSeek-LLM-7B-Base/lora/train_2025-04-22-23-24-06/checkpoint-100/special_tokens_map.json [INFO|2025-04-23 00:13:14] logging.py:143 >> {'loss': 0.2100, 'learning_rate': 1.2243e-04, 'epoch': 1.29, 'throughput': 36.25} [INFO|2025-04-23 00:15:04] logging.py:143 >> {'loss': 0.2568, 'learning_rate': 1.1609e-04, 'epoch': 1.35, 'throughput': 36.24} [INFO|2025-04-23 00:16:56] logging.py:143 >> {'loss': 0.2049, 'learning_rate': 1.0968e-04, 'epoch': 1.42, 'throughput': 36.27} [INFO|2025-04-23 00:18:44] logging.py:143 >> {'loss': 0.2213, 'learning_rate': 1.0323e-04, 'epoch': 1.48, 'throughput': 36.28} [INFO|2025-04-23 00:20:34] logging.py:143 >> {'loss': 0.2836, 'learning_rate': 9.6768e-05, 'epoch': 1.54, 'throughput': 36.28} [INFO|2025-04-23 00:22:25] logging.py:143 >> {'loss': 0.2280, 'learning_rate': 9.0319e-05, 'epoch': 1.60, 'throughput': 36.28} [INFO|2025-04-23 00:24:15] logging.py:143 >> {'loss': 0.1778, 'learning_rate': 8.3910e-05, 'epoch': 1.66, 'throughput': 36.28} [INFO|2025-04-23 00:26:03] logging.py:143 >> {'loss': 0.1871, 'learning_rate': 7.7568e-05, 'epoch': 1.72, 'throughput': 36.27} [INFO|2025-04-23 00:27:52] logging.py:143 >> {'loss': 0.2289, 'learning_rate': 7.1320e-05, 'epoch': 1.78, 'throughput': 36.27} [INFO|2025-04-23 00:29:41] logging.py:143 >> {'loss': 0.2889, 'learning_rate': 6.5191e-05, 'epoch': 1.84, 'throughput': 36.27} [INFO|2025-04-23 00:31:32] logging.py:143 >> {'loss': 0.2724, 'learning_rate': 5.9208e-05, 'epoch': 1.91, 'throughput': 36.25} [INFO|2025-04-23 00:33:23] logging.py:143 >> {'loss': 0.2359, 'learning_rate': 5.3396e-05, 'epoch': 1.97, 'throughput': 36.26} [INFO|2025-04-23 00:35:23] logging.py:143 >> {'loss': 0.1864, 'learning_rate': 4.7778e-05, 'epoch': 2.04, 'throughput': 36.26} [INFO|2025-04-23 00:37:13] logging.py:143 >> {'loss': 0.0671, 'learning_rate': 4.2378e-05, 'epoch': 2.10, 'throughput': 36.26} [INFO|2025-04-23 00:39:02] logging.py:143 >> {'loss': 0.1465, 'learning_rate': 3.7219e-05, 'epoch': 2.16, 'throughput': 36.23} [INFO|2025-04-23 00:40:51] logging.py:143 >> {'loss': 0.0910, 'learning_rate': 3.2322e-05, 'epoch': 2.22, 'throughput': 36.22} [INFO|2025-04-23 00:42:41] logging.py:143 >> {'loss': 0.1140, 'learning_rate': 2.7708e-05, 'epoch': 2.28, 'throughput': 36.21} [INFO|2025-04-23 00:44:33] logging.py:143 >> {'loss': 0.1381, 'learning_rate': 2.3396e-05, 'epoch': 2.34, 'throughput': 36.23} [INFO|2025-04-23 00:46:22] logging.py:143 >> {'loss': 0.0799, 'learning_rate': 1.9403e-05, 'epoch': 2.40, 'throughput': 36.24} [INFO|2025-04-23 00:48:14] logging.py:143 >> {'loss': 0.0722, 'learning_rate': 1.5748e-05, 'epoch': 2.46, 'throughput': 36.25} [INFO|2025-04-23 00:48:14] trainer.py:3984 >> Saving model checkpoint to saves/DeepSeek-LLM-7B-Base/lora/train_2025-04-22-23-24-06/checkpoint-200 [INFO|2025-04-23 00:48:14] configuration_utils.py:693 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--deepseek-ai--deepseek-llm-7b-base/snapshots/7683fea62db869066ddaff6a41d032262c490d4f/config.json [INFO|2025-04-23 00:48:14] configuration_utils.py:765 >> Model config LlamaConfig { "architectures": [ "LlamaForCausalLM" ], "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 1, "eos_token_id": 2, "head_dim": 128, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 11008, "max_position_embeddings": 4096, "mlp_bias": false, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 30, "num_key_value_heads": 32, "pretraining_tp": 1, "rms_norm_eps": 1e-06, "rope_scaling": null, "rope_theta": 10000.0, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.51.3", "use_cache": true, "vocab_size": 102400 } [INFO|2025-04-23 00:48:15] tokenization_utils_base.py:2510 >> tokenizer config file saved in saves/DeepSeek-LLM-7B-Base/lora/train_2025-04-22-23-24-06/checkpoint-200/tokenizer_config.json [INFO|2025-04-23 00:48:15] tokenization_utils_base.py:2519 >> Special tokens file saved in saves/DeepSeek-LLM-7B-Base/lora/train_2025-04-22-23-24-06/checkpoint-200/special_tokens_map.json [INFO|2025-04-23 00:50:13] logging.py:143 >> {'loss': 0.1003, 'learning_rate': 1.2444e-05, 'epoch': 2.53, 'throughput': 36.23} [INFO|2025-04-23 00:52:01] logging.py:143 >> {'loss': 0.1155, 'learning_rate': 9.5063e-06, 'epoch': 2.59, 'throughput': 36.21} [INFO|2025-04-23 00:53:51] logging.py:143 >> {'loss': 0.0962, 'learning_rate': 6.9464e-06, 'epoch': 2.65, 'throughput': 36.22} [INFO|2025-04-23 00:55:43] logging.py:143 >> {'loss': 0.0808, 'learning_rate': 4.7752e-06, 'epoch': 2.71, 'throughput': 36.23} [INFO|2025-04-23 00:57:33] logging.py:143 >> {'loss': 0.1206, 'learning_rate': 3.0018e-06, 'epoch': 2.77, 'throughput': 36.22} [INFO|2025-04-23 00:59:23] logging.py:143 >> {'loss': 0.1259, 'learning_rate': 1.6335e-06, 'epoch': 2.83, 'throughput': 36.22} [INFO|2025-04-23 01:01:11] logging.py:143 >> {'loss': 0.1580, 'learning_rate': 6.7616e-07, 'epoch': 2.89, 'throughput': 36.22} [INFO|2025-04-23 01:02:59] logging.py:143 >> {'loss': 0.0620, 'learning_rate': 1.3368e-07, 'epoch': 2.95, 'throughput': 36.21} [INFO|2025-04-23 01:04:07] trainer.py:3984 >> Saving model checkpoint to saves/DeepSeek-LLM-7B-Base/lora/train_2025-04-22-23-24-06/checkpoint-243 [INFO|2025-04-23 01:04:07] configuration_utils.py:693 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--deepseek-ai--deepseek-llm-7b-base/snapshots/7683fea62db869066ddaff6a41d032262c490d4f/config.json [INFO|2025-04-23 01:04:07] configuration_utils.py:765 >> Model config LlamaConfig { "architectures": [ "LlamaForCausalLM" ], "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 1, "eos_token_id": 2, "head_dim": 128, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 11008, "max_position_embeddings": 4096, "mlp_bias": false, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 30, "num_key_value_heads": 32, "pretraining_tp": 1, "rms_norm_eps": 1e-06, "rope_scaling": null, "rope_theta": 10000.0, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.51.3", "use_cache": true, "vocab_size": 102400 } [INFO|2025-04-23 01:04:08] tokenization_utils_base.py:2510 >> tokenizer config file saved in saves/DeepSeek-LLM-7B-Base/lora/train_2025-04-22-23-24-06/checkpoint-243/tokenizer_config.json [INFO|2025-04-23 01:04:08] tokenization_utils_base.py:2519 >> Special tokens file saved in saves/DeepSeek-LLM-7B-Base/lora/train_2025-04-22-23-24-06/checkpoint-243/special_tokens_map.json [INFO|2025-04-23 01:04:09] trainer.py:2681 >> Training completed. Do not forget to share your model on huggingface.co/models =) [INFO|2025-04-23 01:04:09] trainer.py:3984 >> Saving model checkpoint to saves/DeepSeek-LLM-7B-Base/lora/train_2025-04-22-23-24-06 [INFO|2025-04-23 01:04:09] configuration_utils.py:693 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--deepseek-ai--deepseek-llm-7b-base/snapshots/7683fea62db869066ddaff6a41d032262c490d4f/config.json [INFO|2025-04-23 01:04:09] configuration_utils.py:765 >> Model config LlamaConfig { "architectures": [ "LlamaForCausalLM" ], "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 1, "eos_token_id": 2, "head_dim": 128, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 11008, "max_position_embeddings": 4096, "mlp_bias": false, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 30, "num_key_value_heads": 32, "pretraining_tp": 1, "rms_norm_eps": 1e-06, "rope_scaling": null, "rope_theta": 10000.0, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.51.3", "use_cache": true, "vocab_size": 102400 } [INFO|2025-04-23 01:04:09] tokenization_utils_base.py:2510 >> tokenizer config file saved in saves/DeepSeek-LLM-7B-Base/lora/train_2025-04-22-23-24-06/tokenizer_config.json [INFO|2025-04-23 01:04:09] tokenization_utils_base.py:2519 >> Special tokens file saved in saves/DeepSeek-LLM-7B-Base/lora/train_2025-04-22-23-24-06/special_tokens_map.json [WARNING|2025-04-23 01:04:10] logging.py:148 >> No metric eval_loss to plot. [WARNING|2025-04-23 01:04:10] logging.py:148 >> No metric eval_accuracy to plot. [INFO|2025-04-23 01:04:10] modelcard.py:450 >> Dropping the following result as it does not have all the necessary fields: {'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}