End of training

Browse files

Files changed (9) hide show

README.md +17 -32
adapter_config.json +3 -3
last-checkpoint/adapter_config.json +3 -3
last-checkpoint/optimizer.pt +2 -2
last-checkpoint/rng_state.pth +1 -1
last-checkpoint/scheduler.pt +1 -1
last-checkpoint/trainer_state.json +51 -689
last-checkpoint/training_args.bin +1 -1
training_args.bin +1 -1

README.md CHANGED Viewed

@@ -6,7 +6,7 @@ tags:
 - axolotl
 - generated_from_trainer
 model-index:
-- name: 8cdb6276-aeed-49f9-8ffa-9eea475835ec
   results: []
 ---
@@ -18,12 +18,6 @@ should probably proofread and complete it, then remove this comment. -->
 axolotl version: `0.4.1`
 ```yaml
-accelerate_config:
-  dynamo_backend: inductor
-  mixed_precision: bf16
-  num_machines: 1
-  num_processes: auto
-  use_cpu: false
 adapter: lora
 base_model: echarlaix/tiny-random-mistral
 bf16: auto
@@ -45,7 +39,6 @@ datasets:
     system_prompt: ''
 debug: null
 deepspeed: null
-device_map: auto
 early_stopping_patience: null
 eval_max_new_tokens: 128
 eval_table_size: null
@@ -54,14 +47,16 @@ flash_attention: false
 fp16: null
 fsdp: null
 fsdp_config: null
-gradient_accumulation_steps: 16
-gradient_checkpointing: true
 group_by_length: false
 hub_model_id: null
 hub_repo: null
 hub_strategy: checkpoint
 hub_token: null
-learning_rate: 0.0001
 local_rank: null
 logging_steps: 1
 lora_alpha: 16
@@ -70,13 +65,8 @@ lora_fan_in_fan_out: null
 lora_model_dir: null
 lora_r: 8
 lora_target_linear: true
-lora_target_modules:
-- q_proj
-- v_proj
 lr_scheduler: cosine
-max_memory:
-  0: 70GiB
-max_steps: 100
 micro_batch_size: 2
 mlflow_experiment_name: /tmp/2e4f87c3b388cefd_train_data.json
 model_type: AutoModelForCausalLM
@@ -84,9 +74,6 @@ num_epochs: 1
 optimizer: adamw_bnb_8bit
 output_dir: miner_id_24
 pad_to_sequence_len: true
-quantization_config:
-  llm_int8_enable_fp32_cpu_offload: true
-  load_in_8bit: true
 resume_from_checkpoint: null
 s2_attention: null
 sample_packing: false
@@ -97,14 +84,13 @@ special_tokens:
 strict: false
 tf32: false
 tokenizer_type: AutoTokenizer
-torch_compile: true
 train_on_inputs: false
 trust_remote_code: true
 val_set_size: 0.05
 wandb_entity: null
 wandb_mode: online
 wandb_name: 8cdb6276-aeed-49f9-8ffa-9eea475835ec
-wandb_project: Gradients-On-Demand
 wandb_run: your_name
 wandb_runid: 8cdb6276-aeed-49f9-8ffa-9eea475835ec
 warmup_steps: 10
@@ -115,7 +101,7 @@ xformers_attention: null
 </details><br>
-# 8cdb6276-aeed-49f9-8ffa-9eea475835ec
 This model is a fine-tuned version of [echarlaix/tiny-random-mistral](https://huggingface.co/echarlaix/tiny-random-mistral) on the None dataset.
 It achieves the following results on the evaluation set:
@@ -138,26 +124,25 @@ More information needed
 ### Training hyperparameters
 The following hyperparameters were used during training:
-- learning_rate: 0.0001
 - train_batch_size: 2
 - eval_batch_size: 2
 - seed: 42
-- gradient_accumulation_steps: 16
-- total_train_batch_size: 32
 - optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_steps: 10
-- training_steps: 100
 ### Training results
 | Training Loss | Epoch  | Step | Validation Loss |
 |:-------------:|:------:|:----:|:---------------:|
-| 0.0           | 0.0038 | 1    | nan             |
-| 0.0           | 0.0946 | 25   | nan             |
-| 0.0           | 0.1892 | 50   | nan             |
-| 0.0           | 0.2838 | 75   | nan             |
-| 0.0           | 0.3784 | 100  | nan             |
 ### Framework versions

 - axolotl
 - generated_from_trainer
 model-index:
+- name: 7e5f69a5-09fd-4c79-abe8-7f4ff3f6b7e5
   results: []
 ---
 axolotl version: `0.4.1`
 ```yaml
 adapter: lora
 base_model: echarlaix/tiny-random-mistral
 bf16: auto
     system_prompt: ''
 debug: null
 deepspeed: null
 early_stopping_patience: null
 eval_max_new_tokens: 128
 eval_table_size: null
 fp16: null
 fsdp: null
 fsdp_config: null
+gradient_accumulation_steps: 4
+gradient_checkpointing: false
 group_by_length: false
 hub_model_id: null
 hub_repo: null
 hub_strategy: checkpoint
 hub_token: null
+learning_rate: 0.0002
+load_in_4bit: false
+load_in_8bit: false
 local_rank: null
 logging_steps: 1
 lora_alpha: 16
 lora_model_dir: null
 lora_r: 8
 lora_target_linear: true
 lr_scheduler: cosine
+max_steps: 10
 micro_batch_size: 2
 mlflow_experiment_name: /tmp/2e4f87c3b388cefd_train_data.json
 model_type: AutoModelForCausalLM
 optimizer: adamw_bnb_8bit
 output_dir: miner_id_24
 pad_to_sequence_len: true
 resume_from_checkpoint: null
 s2_attention: null
 sample_packing: false
 strict: false
 tf32: false
 tokenizer_type: AutoTokenizer
 train_on_inputs: false
 trust_remote_code: true
 val_set_size: 0.05
 wandb_entity: null
 wandb_mode: online
 wandb_name: 8cdb6276-aeed-49f9-8ffa-9eea475835ec
+wandb_project: Birthday-SN56-7-Gradients-On-Demand
 wandb_run: your_name
 wandb_runid: 8cdb6276-aeed-49f9-8ffa-9eea475835ec
 warmup_steps: 10
 </details><br>
+# 7e5f69a5-09fd-4c79-abe8-7f4ff3f6b7e5
 This model is a fine-tuned version of [echarlaix/tiny-random-mistral](https://huggingface.co/echarlaix/tiny-random-mistral) on the None dataset.
 It achieves the following results on the evaluation set:
 ### Training hyperparameters
 The following hyperparameters were used during training:
+- learning_rate: 0.0002
 - train_batch_size: 2
 - eval_batch_size: 2
 - seed: 42
+- gradient_accumulation_steps: 4
+- total_train_batch_size: 8
 - optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_steps: 10
+- training_steps: 10
 ### Training results
 | Training Loss | Epoch  | Step | Validation Loss |
 |:-------------:|:------:|:----:|:---------------:|
+| 0.0           | 0.0009 | 1    | nan             |
+| 0.0           | 0.0028 | 3    | nan             |
+| 0.0           | 0.0057 | 6    | nan             |
+| 0.0           | 0.0085 | 9    | nan             |
 ### Framework versions

adapter_config.json CHANGED Viewed

@@ -20,13 +20,13 @@
   "rank_pattern": {},
   "revision": null,
   "target_modules": [
-    "gate_proj",
-    "v_proj",
     "k_proj",
     "o_proj",
     "down_proj",
     "up_proj",
-    "q_proj"
   ],
   "task_type": "CAUSAL_LM",
   "use_dora": false,

   "rank_pattern": {},
   "revision": null,
   "target_modules": [
     "k_proj",
     "o_proj",
+    "gate_proj",
     "down_proj",
     "up_proj",
+    "q_proj",
+    "v_proj"
   ],
   "task_type": "CAUSAL_LM",
   "use_dora": false,

last-checkpoint/adapter_config.json CHANGED Viewed

@@ -20,13 +20,13 @@
   "rank_pattern": {},
   "revision": null,
   "target_modules": [
-    "gate_proj",
-    "v_proj",
     "k_proj",
     "o_proj",
     "down_proj",
     "up_proj",
-    "q_proj"
   ],
   "task_type": "CAUSAL_LM",
   "use_dora": false,

   "rank_pattern": {},
   "revision": null,
   "target_modules": [
     "k_proj",
     "o_proj",
+    "gate_proj",
     "down_proj",
     "up_proj",
+    "q_proj",
+    "v_proj"
   ],
   "task_type": "CAUSAL_LM",
   "use_dora": false,

last-checkpoint/optimizer.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:d9a94694054557f16b87aecce198941be522358d1af0ee78b964d87b97e4d230
-size 71654

 version https://git-lfs.github.com/spec/v1
+oid sha256:bbe10c074c53af62af0964d6b70d2e6528b906557b640a82231880d56e53359d
+size 71718

last-checkpoint/rng_state.pth CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:d6516708bef1a97795da04892fd3df3bf2305ac978924d9e7083e4eed6ef52b1
 size 14244

 version https://git-lfs.github.com/spec/v1
+oid sha256:c12066a9c624fe38430ff3feea2dc6451e9f1a920255c11680c737a33d2c53a0
 size 14244

last-checkpoint/scheduler.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:49d60a69e2379be2053e816cbaff31e6c931b5922dd86c71c9eaf473299cbf62
 size 1064

 version https://git-lfs.github.com/spec/v1
+oid sha256:bb578e75c11a81e85dda67a691f96ba4793a02960f1409fd3e1511aac873491a
 size 1064

last-checkpoint/trainer_state.json CHANGED Viewed

@@ -1,759 +1,121 @@
 {
   "best_metric": null,
   "best_model_checkpoint": null,
-  "epoch": 0.3784295175023652,
-  "eval_steps": 25,
-  "global_step": 100,
   "is_hyper_param_search": false,
   "is_local_process_zero": true,
   "is_world_process_zero": true,
   "log_history": [
     {
-      "epoch": 0.003784295175023652,
       "grad_norm": NaN,
-      "learning_rate": 1e-05,
       "loss": 0.0,
       "step": 1
     },
     {
-      "epoch": 0.003784295175023652,
       "eval_loss": NaN,
-      "eval_runtime": 22.8133,
-      "eval_samples_per_second": 19.506,
-      "eval_steps_per_second": 9.775,
       "step": 1
     },
     {
-      "epoch": 0.007568590350047304,
-      "grad_norm": NaN,
-      "learning_rate": 2e-05,
-      "loss": 0.0,
-      "step": 2
-    },
-    {
-      "epoch": 0.011352885525070956,
-      "grad_norm": NaN,
-      "learning_rate": 3e-05,
-      "loss": 0.0,
-      "step": 3
-    },
-    {
-      "epoch": 0.015137180700094607,
       "grad_norm": NaN,
       "learning_rate": 4e-05,
       "loss": 0.0,
-      "step": 4
-    },
-    {
-      "epoch": 0.01892147587511826,
-      "grad_norm": NaN,
-      "learning_rate": 5e-05,
-      "loss": 0.0,
-      "step": 5
     },
     {
-      "epoch": 0.02270577105014191,
       "grad_norm": NaN,
       "learning_rate": 6e-05,
       "loss": 0.0,
-      "step": 6
     },
     {
-      "epoch": 0.026490066225165563,
-      "grad_norm": NaN,
-      "learning_rate": 7e-05,
-      "loss": 0.0,
-      "step": 7
     },
     {
-      "epoch": 0.030274361400189215,
       "grad_norm": NaN,
       "learning_rate": 8e-05,
       "loss": 0.0,
-      "step": 8
-    },
-    {
-      "epoch": 0.03405865657521287,
-      "grad_norm": NaN,
-      "learning_rate": 9e-05,
-      "loss": 0.0,
-      "step": 9
     },
     {
-      "epoch": 0.03784295175023652,
       "grad_norm": NaN,
       "learning_rate": 0.0001,
       "loss": 0.0,
-      "step": 10
-    },
-    {
-      "epoch": 0.041627246925260174,
-      "grad_norm": NaN,
-      "learning_rate": 9.99695413509548e-05,
-      "loss": 0.0,
-      "step": 11
-    },
-    {
-      "epoch": 0.04541154210028382,
-      "grad_norm": NaN,
-      "learning_rate": 9.987820251299122e-05,
-      "loss": 0.0,
-      "step": 12
-    },
-    {
-      "epoch": 0.04919583727530748,
-      "grad_norm": NaN,
-      "learning_rate": 9.972609476841367e-05,
-      "loss": 0.0,
-      "step": 13
-    },
-    {
-      "epoch": 0.052980132450331126,
-      "grad_norm": NaN,
-      "learning_rate": 9.951340343707852e-05,
-      "loss": 0.0,
-      "step": 14
-    },
-    {
-      "epoch": 0.05676442762535478,
-      "grad_norm": NaN,
-      "learning_rate": 9.924038765061042e-05,
-      "loss": 0.0,
-      "step": 15
-    },
-    {
-      "epoch": 0.06054872280037843,
-      "grad_norm": NaN,
-      "learning_rate": 9.890738003669029e-05,
-      "loss": 0.0,
-      "step": 16
-    },
-    {
-      "epoch": 0.06433301797540208,
-      "grad_norm": NaN,
-      "learning_rate": 9.851478631379982e-05,
-      "loss": 0.0,
-      "step": 17
-    },
-    {
-      "epoch": 0.06811731315042574,
-      "grad_norm": NaN,
-      "learning_rate": 9.806308479691595e-05,
-      "loss": 0.0,
-      "step": 18
-    },
-    {
-      "epoch": 0.07190160832544938,
-      "grad_norm": NaN,
-      "learning_rate": 9.755282581475769e-05,
-      "loss": 0.0,
-      "step": 19
-    },
-    {
-      "epoch": 0.07568590350047304,
-      "grad_norm": NaN,
-      "learning_rate": 9.698463103929542e-05,
-      "loss": 0.0,
-      "step": 20
-    },
-    {
-      "epoch": 0.07947019867549669,
-      "grad_norm": NaN,
-      "learning_rate": 9.635919272833938e-05,
-      "loss": 0.0,
-      "step": 21
-    },
-    {
-      "epoch": 0.08325449385052035,
-      "grad_norm": NaN,
-      "learning_rate": 9.567727288213005e-05,
-      "loss": 0.0,
-      "step": 22
-    },
-    {
-      "epoch": 0.08703878902554399,
-      "grad_norm": NaN,
-      "learning_rate": 9.493970231495835e-05,
-      "loss": 0.0,
-      "step": 23
-    },
-    {
-      "epoch": 0.09082308420056764,
-      "grad_norm": NaN,
-      "learning_rate": 9.414737964294636e-05,
-      "loss": 0.0,
-      "step": 24
-    },
-    {
-      "epoch": 0.0946073793755913,
-      "grad_norm": NaN,
-      "learning_rate": 9.330127018922194e-05,
-      "loss": 0.0,
-      "step": 25
-    },
-    {
-      "epoch": 0.0946073793755913,
-      "eval_loss": NaN,
-      "eval_runtime": 1.6494,
-      "eval_samples_per_second": 269.802,
-      "eval_steps_per_second": 135.204,
-      "step": 25
-    },
-    {
-      "epoch": 0.09839167455061495,
-      "grad_norm": NaN,
-      "learning_rate": 9.24024048078213e-05,
-      "loss": 0.0,
-      "step": 26
-    },
-    {
-      "epoch": 0.1021759697256386,
-      "grad_norm": NaN,
-      "learning_rate": 9.145187862775209e-05,
-      "loss": 0.0,
-      "step": 27
-    },
-    {
-      "epoch": 0.10596026490066225,
-      "grad_norm": NaN,
-      "learning_rate": 9.045084971874738e-05,
-      "loss": 0.0,
-      "step": 28
-    },
-    {
-      "epoch": 0.1097445600756859,
-      "grad_norm": NaN,
-      "learning_rate": 8.940053768033609e-05,
-      "loss": 0.0,
-      "step": 29
-    },
-    {
-      "epoch": 0.11352885525070956,
-      "grad_norm": NaN,
-      "learning_rate": 8.83022221559489e-05,
-      "loss": 0.0,
-      "step": 30
-    },
-    {
-      "epoch": 0.1173131504257332,
-      "grad_norm": NaN,
-      "learning_rate": 8.715724127386972e-05,
-      "loss": 0.0,
-      "step": 31
-    },
-    {
-      "epoch": 0.12109744560075686,
-      "grad_norm": NaN,
-      "learning_rate": 8.596699001693255e-05,
-      "loss": 0.0,
-      "step": 32
-    },
-    {
-      "epoch": 0.12488174077578051,
-      "grad_norm": NaN,
-      "learning_rate": 8.473291852294987e-05,
-      "loss": 0.0,
-      "step": 33
-    },
-    {
-      "epoch": 0.12866603595080417,
-      "grad_norm": NaN,
-      "learning_rate": 8.345653031794292e-05,
-      "loss": 0.0,
-      "step": 34
-    },
-    {
-      "epoch": 0.13245033112582782,
-      "grad_norm": NaN,
-      "learning_rate": 8.213938048432697e-05,
-      "loss": 0.0,
-      "step": 35
-    },
-    {
-      "epoch": 0.13623462630085148,
-      "grad_norm": NaN,
-      "learning_rate": 8.07830737662829e-05,
-      "loss": 0.0,
-      "step": 36
-    },
-    {
-      "epoch": 0.1400189214758751,
-      "grad_norm": NaN,
-      "learning_rate": 7.938926261462366e-05,
-      "loss": 0.0,
-      "step": 37
-    },
-    {
-      "epoch": 0.14380321665089876,
-      "grad_norm": NaN,
-      "learning_rate": 7.795964517353735e-05,
-      "loss": 0.0,
-      "step": 38
-    },
-    {
-      "epoch": 0.14758751182592242,
-      "grad_norm": NaN,
-      "learning_rate": 7.649596321166024e-05,
-      "loss": 0.0,
-      "step": 39
-    },
-    {
-      "epoch": 0.15137180700094607,
-      "grad_norm": NaN,
-      "learning_rate": 7.500000000000001e-05,
-      "loss": 0.0,
-      "step": 40
-    },
-    {
-      "epoch": 0.15515610217596973,
-      "grad_norm": NaN,
-      "learning_rate": 7.347357813929454e-05,
-      "loss": 0.0,
-      "step": 41
-    },
-    {
-      "epoch": 0.15894039735099338,
-      "grad_norm": NaN,
-      "learning_rate": 7.191855733945387e-05,
-      "loss": 0.0,
-      "step": 42
-    },
-    {
-      "epoch": 0.16272469252601704,
-      "grad_norm": NaN,
-      "learning_rate": 7.033683215379002e-05,
-      "loss": 0.0,
-      "step": 43
-    },
-    {
-      "epoch": 0.1665089877010407,
-      "grad_norm": NaN,
-      "learning_rate": 6.873032967079561e-05,
-      "loss": 0.0,
-      "step": 44
-    },
-    {
-      "epoch": 0.17029328287606432,
-      "grad_norm": NaN,
-      "learning_rate": 6.710100716628344e-05,
-      "loss": 0.0,
-      "step": 45
-    },
-    {
-      "epoch": 0.17407757805108798,
-      "grad_norm": NaN,
-      "learning_rate": 6.545084971874738e-05,
-      "loss": 0.0,
-      "step": 46
-    },
-    {
-      "epoch": 0.17786187322611163,
-      "grad_norm": NaN,
-      "learning_rate": 6.378186779084995e-05,
-      "loss": 0.0,
-      "step": 47
-    },
-    {
-      "epoch": 0.1816461684011353,
-      "grad_norm": NaN,
-      "learning_rate": 6.209609477998338e-05,
-      "loss": 0.0,
-      "step": 48
-    },
-    {
-      "epoch": 0.18543046357615894,
-      "grad_norm": NaN,
-      "learning_rate": 6.0395584540887963e-05,
-      "loss": 0.0,
-      "step": 49
     },
     {
-      "epoch": 0.1892147587511826,
       "grad_norm": NaN,
-      "learning_rate": 5.868240888334653e-05,
       "loss": 0.0,
-      "step": 50
     },
     {
-      "epoch": 0.1892147587511826,
       "eval_loss": NaN,
-      "eval_runtime": 1.6547,
-      "eval_samples_per_second": 268.932,
-      "eval_steps_per_second": 134.768,
-      "step": 50
-    },
-    {
-      "epoch": 0.19299905392620625,
-      "grad_norm": NaN,
-      "learning_rate": 5.695865504800327e-05,
-      "loss": 0.0,
-      "step": 51
-    },
-    {
-      "epoch": 0.1967833491012299,
-      "grad_norm": NaN,
-      "learning_rate": 5.522642316338268e-05,
-      "loss": 0.0,
-      "step": 52
-    },
-    {
-      "epoch": 0.20056764427625354,
-      "grad_norm": NaN,
-      "learning_rate": 5.348782368720626e-05,
-      "loss": 0.0,
-      "step": 53
-    },
-    {
-      "epoch": 0.2043519394512772,
-      "grad_norm": NaN,
-      "learning_rate": 5.174497483512506e-05,
-      "loss": 0.0,
-      "step": 54
-    },
-    {
-      "epoch": 0.20813623462630085,
-      "grad_norm": NaN,
-      "learning_rate": 5e-05,
-      "loss": 0.0,
-      "step": 55
-    },
-    {
-      "epoch": 0.2119205298013245,
-      "grad_norm": NaN,
-      "learning_rate": 4.825502516487497e-05,
-      "loss": 0.0,
-      "step": 56
-    },
-    {
-      "epoch": 0.21570482497634816,
-      "grad_norm": NaN,
-      "learning_rate": 4.6512176312793736e-05,
-      "loss": 0.0,
-      "step": 57
-    },
-    {
-      "epoch": 0.2194891201513718,
-      "grad_norm": NaN,
-      "learning_rate": 4.477357683661734e-05,
-      "loss": 0.0,
-      "step": 58
-    },
-    {
-      "epoch": 0.22327341532639547,
-      "grad_norm": NaN,
-      "learning_rate": 4.3041344951996746e-05,
-      "loss": 0.0,
-      "step": 59
-    },
-    {
-      "epoch": 0.22705771050141912,
-      "grad_norm": NaN,
-      "learning_rate": 4.131759111665349e-05,
-      "loss": 0.0,
-      "step": 60
-    },
-    {
-      "epoch": 0.23084200567644275,
-      "grad_norm": NaN,
-      "learning_rate": 3.960441545911204e-05,
-      "loss": 0.0,
-      "step": 61
-    },
-    {
-      "epoch": 0.2346263008514664,
-      "grad_norm": NaN,
-      "learning_rate": 3.790390522001662e-05,
-      "loss": 0.0,
-      "step": 62
-    },
-    {
-      "epoch": 0.23841059602649006,
-      "grad_norm": NaN,
-      "learning_rate": 3.6218132209150045e-05,
-      "loss": 0.0,
-      "step": 63
-    },
-    {
-      "epoch": 0.24219489120151372,
-      "grad_norm": NaN,
-      "learning_rate": 3.4549150281252636e-05,
-      "loss": 0.0,
-      "step": 64
-    },
-    {
-      "epoch": 0.24597918637653737,
-      "grad_norm": NaN,
-      "learning_rate": 3.289899283371657e-05,
-      "loss": 0.0,
-      "step": 65
-    },
-    {
-      "epoch": 0.24976348155156103,
-      "grad_norm": NaN,
-      "learning_rate": 3.12696703292044e-05,
-      "loss": 0.0,
-      "step": 66
-    },
-    {
-      "epoch": 0.2535477767265847,
-      "grad_norm": NaN,
-      "learning_rate": 2.9663167846209998e-05,
-      "loss": 0.0,
-      "step": 67
-    },
-    {
-      "epoch": 0.25733207190160834,
-      "grad_norm": NaN,
-      "learning_rate": 2.8081442660546125e-05,
-      "loss": 0.0,
-      "step": 68
-    },
-    {
-      "epoch": 0.261116367076632,
-      "grad_norm": NaN,
-      "learning_rate": 2.6526421860705473e-05,
-      "loss": 0.0,
-      "step": 69
-    },
-    {
-      "epoch": 0.26490066225165565,
-      "grad_norm": NaN,
-      "learning_rate": 2.500000000000001e-05,
-      "loss": 0.0,
-      "step": 70
-    },
-    {
-      "epoch": 0.2686849574266793,
-      "grad_norm": NaN,
-      "learning_rate": 2.350403678833976e-05,
-      "loss": 0.0,
-      "step": 71
-    },
-    {
-      "epoch": 0.27246925260170296,
-      "grad_norm": NaN,
-      "learning_rate": 2.2040354826462668e-05,
-      "loss": 0.0,
-      "step": 72
     },
     {
-      "epoch": 0.27625354777672656,
       "grad_norm": NaN,
-      "learning_rate": 2.061073738537635e-05,
       "loss": 0.0,
-      "step": 73
     },
     {
-      "epoch": 0.2800378429517502,
       "grad_norm": NaN,
-      "learning_rate": 1.9216926233717085e-05,
       "loss": 0.0,
-      "step": 74
     },
     {
-      "epoch": 0.28382213812677387,
       "grad_norm": NaN,
-      "learning_rate": 1.7860619515673033e-05,
       "loss": 0.0,
-      "step": 75
     },
     {
-      "epoch": 0.28382213812677387,
       "eval_loss": NaN,
-      "eval_runtime": 1.6574,
-      "eval_samples_per_second": 268.495,
-      "eval_steps_per_second": 134.549,
-      "step": 75
-    },
-    {
-      "epoch": 0.2876064333017975,
-      "grad_norm": NaN,
-      "learning_rate": 1.6543469682057106e-05,
-      "loss": 0.0,
-      "step": 76
-    },
-    {
-      "epoch": 0.2913907284768212,
-      "grad_norm": NaN,
-      "learning_rate": 1.526708147705013e-05,
-      "loss": 0.0,
-      "step": 77
-    },
-    {
-      "epoch": 0.29517502365184484,
-      "grad_norm": NaN,
-      "learning_rate": 1.4033009983067452e-05,
-      "loss": 0.0,
-      "step": 78
-    },
-    {
-      "epoch": 0.2989593188268685,
-      "grad_norm": NaN,
-      "learning_rate": 1.2842758726130283e-05,
-      "loss": 0.0,
-      "step": 79
-    },
-    {
-      "epoch": 0.30274361400189215,
-      "grad_norm": NaN,
-      "learning_rate": 1.1697777844051105e-05,
-      "loss": 0.0,
-      "step": 80
-    },
-    {
-      "epoch": 0.3065279091769158,
-      "grad_norm": NaN,
-      "learning_rate": 1.0599462319663905e-05,
-      "loss": 0.0,
-      "step": 81
-    },
-    {
-      "epoch": 0.31031220435193946,
-      "grad_norm": NaN,
-      "learning_rate": 9.549150281252633e-06,
-      "loss": 0.0,
-      "step": 82
-    },
-    {
-      "epoch": 0.3140964995269631,
-      "grad_norm": NaN,
-      "learning_rate": 8.548121372247918e-06,
-      "loss": 0.0,
-      "step": 83
-    },
-    {
-      "epoch": 0.31788079470198677,
-      "grad_norm": NaN,
-      "learning_rate": 7.597595192178702e-06,
-      "loss": 0.0,
-      "step": 84
-    },
-    {
-      "epoch": 0.3216650898770104,
-      "grad_norm": NaN,
-      "learning_rate": 6.698729810778065e-06,
-      "loss": 0.0,
-      "step": 85
-    },
-    {
-      "epoch": 0.3254493850520341,
-      "grad_norm": NaN,
-      "learning_rate": 5.852620357053651e-06,
-      "loss": 0.0,
-      "step": 86
-    },
-    {
-      "epoch": 0.32923368022705773,
-      "grad_norm": NaN,
-      "learning_rate": 5.060297685041659e-06,
-      "loss": 0.0,
-      "step": 87
-    },
-    {
-      "epoch": 0.3330179754020814,
-      "grad_norm": NaN,
-      "learning_rate": 4.322727117869951e-06,
-      "loss": 0.0,
-      "step": 88
-    },
-    {
-      "epoch": 0.336802270577105,
-      "grad_norm": NaN,
-      "learning_rate": 3.6408072716606346e-06,
-      "loss": 0.0,
-      "step": 89
-    },
-    {
-      "epoch": 0.34058656575212864,
-      "grad_norm": NaN,
-      "learning_rate": 3.0153689607045845e-06,
-      "loss": 0.0,
-      "step": 90
-    },
-    {
-      "epoch": 0.3443708609271523,
-      "grad_norm": NaN,
-      "learning_rate": 2.4471741852423237e-06,
-      "loss": 0.0,
-      "step": 91
-    },
-    {
-      "epoch": 0.34815515610217596,
-      "grad_norm": NaN,
-      "learning_rate": 1.9369152030840556e-06,
-      "loss": 0.0,
-      "step": 92
-    },
-    {
-      "epoch": 0.3519394512771996,
-      "grad_norm": NaN,
-      "learning_rate": 1.4852136862001764e-06,
-      "loss": 0.0,
-      "step": 93
-    },
-    {
-      "epoch": 0.35572374645222327,
-      "grad_norm": NaN,
-      "learning_rate": 1.0926199633097157e-06,
-      "loss": 0.0,
-      "step": 94
-    },
-    {
-      "epoch": 0.3595080416272469,
-      "grad_norm": NaN,
-      "learning_rate": 7.596123493895991e-07,
-      "loss": 0.0,
-      "step": 95
-    },
-    {
-      "epoch": 0.3632923368022706,
-      "grad_norm": NaN,
-      "learning_rate": 4.865965629214819e-07,
-      "loss": 0.0,
-      "step": 96
-    },
-    {
-      "epoch": 0.36707663197729423,
-      "grad_norm": NaN,
-      "learning_rate": 2.7390523158633554e-07,
-      "loss": 0.0,
-      "step": 97
-    },
-    {
-      "epoch": 0.3708609271523179,
-      "grad_norm": NaN,
-      "learning_rate": 1.2179748700879012e-07,
-      "loss": 0.0,
-      "step": 98
-    },
-    {
-      "epoch": 0.37464522232734154,
-      "grad_norm": NaN,
-      "learning_rate": 3.04586490452119e-08,
-      "loss": 0.0,
-      "step": 99
     },
     {
-      "epoch": 0.3784295175023652,
       "grad_norm": NaN,
-      "learning_rate": 0.0,
       "loss": 0.0,
-      "step": 100
-    },
-    {
-      "epoch": 0.3784295175023652,
-      "eval_loss": NaN,
-      "eval_runtime": 1.6516,
-      "eval_samples_per_second": 269.436,
-      "eval_steps_per_second": 135.021,
-      "step": 100
     }
   ],
   "logging_steps": 1,
-  "max_steps": 100,
   "num_input_tokens_seen": 0,
   "num_train_epochs": 1,
-  "save_steps": 25,
   "stateful_callbacks": {
     "TrainerControl": {
       "args": {
@@ -766,7 +128,7 @@
       "attributes": {}
     }
   },
-  "total_flos": 10265926041600.0,
   "train_batch_size": 2,
   "trial_name": null,
   "trial_params": null

 {
   "best_metric": null,
   "best_model_checkpoint": null,
+  "epoch": 0.00946073793755913,
+  "eval_steps": 3,
+  "global_step": 10,
   "is_hyper_param_search": false,
   "is_local_process_zero": true,
   "is_world_process_zero": true,
   "log_history": [
     {
+      "epoch": 0.000946073793755913,
       "grad_norm": NaN,
+      "learning_rate": 2e-05,
       "loss": 0.0,
       "step": 1
     },
     {
+      "epoch": 0.000946073793755913,
       "eval_loss": NaN,
+      "eval_runtime": 2.1382,
+      "eval_samples_per_second": 208.123,
+      "eval_steps_per_second": 104.296,
       "step": 1
     },
     {
+      "epoch": 0.001892147587511826,
       "grad_norm": NaN,
       "learning_rate": 4e-05,
       "loss": 0.0,
+      "step": 2
     },
     {
+      "epoch": 0.002838221381267739,
       "grad_norm": NaN,
       "learning_rate": 6e-05,
       "loss": 0.0,
+      "step": 3
     },
     {
+      "epoch": 0.002838221381267739,
+      "eval_loss": NaN,
+      "eval_runtime": 1.2299,
+      "eval_samples_per_second": 361.829,
+      "eval_steps_per_second": 181.321,
+      "step": 3
     },
     {
+      "epoch": 0.003784295175023652,
       "grad_norm": NaN,
       "learning_rate": 8e-05,
       "loss": 0.0,
+      "step": 4
     },
     {
+      "epoch": 0.004730368968779565,
       "grad_norm": NaN,
       "learning_rate": 0.0001,
       "loss": 0.0,
+      "step": 5
     },
     {
+      "epoch": 0.005676442762535478,
       "grad_norm": NaN,
+      "learning_rate": 0.00012,
       "loss": 0.0,
+      "step": 6
     },
     {
+      "epoch": 0.005676442762535478,
       "eval_loss": NaN,
+      "eval_runtime": 1.2783,
+      "eval_samples_per_second": 348.122,
+      "eval_steps_per_second": 174.452,
+      "step": 6
     },
     {
+      "epoch": 0.006622516556291391,
       "grad_norm": NaN,
+      "learning_rate": 0.00014,
       "loss": 0.0,
+      "step": 7
     },
     {
+      "epoch": 0.007568590350047304,
       "grad_norm": NaN,
+      "learning_rate": 0.00016,
       "loss": 0.0,
+      "step": 8
     },
     {
+      "epoch": 0.008514664143803218,
       "grad_norm": NaN,
+      "learning_rate": 0.00018,
       "loss": 0.0,
+      "step": 9
     },
     {
+      "epoch": 0.008514664143803218,
       "eval_loss": NaN,
+      "eval_runtime": 1.2302,
+      "eval_samples_per_second": 361.741,
+      "eval_steps_per_second": 181.277,
+      "step": 9
     },
     {
+      "epoch": 0.00946073793755913,
       "grad_norm": NaN,
+      "learning_rate": 0.0002,
       "loss": 0.0,
+      "step": 10
     }
   ],
   "logging_steps": 1,
+  "max_steps": 10,
   "num_input_tokens_seen": 0,
   "num_train_epochs": 1,
+  "save_steps": 3,
   "stateful_callbacks": {
     "TrainerControl": {
       "args": {
       "attributes": {}
     }
   },
+  "total_flos": 256648151040.0,
   "train_batch_size": 2,
   "trial_name": null,
   "trial_params": null

last-checkpoint/training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:dc00a92cfc7c8c98b0bc5d7150d56efdb7b38c866732e449bb7d7de115fa289e
 size 6776

 version https://git-lfs.github.com/spec/v1
+oid sha256:5008c806fe2aaf7df7590c5412697e0c6dd5a1a55326923e59d716f7c97f2264
 size 6776

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:dc00a92cfc7c8c98b0bc5d7150d56efdb7b38c866732e449bb7d7de115fa289e
 size 6776

 version https://git-lfs.github.com/spec/v1
+oid sha256:5008c806fe2aaf7df7590c5412697e0c6dd5a1a55326923e59d716f7c97f2264
 size 6776