SudiptoPramanik commited on Jun 8

Commit

8b64380

verified ·

1 Parent(s): 47bde41

Upload folder using huggingface_hub

Browse files

Files changed (26) hide show

README.md +181 -34
adapter_config.json +4 -4
adapter_model.safetensors +2 -2
checkpoint-500/README.md +1 -1
checkpoint-500/adapter_config.json +4 -4
checkpoint-500/adapter_model.safetensors +2 -2
checkpoint-500/chat_template.jinja +93 -0
checkpoint-500/optimizer.pt +2 -2
checkpoint-500/rng_state.pth +1 -1
checkpoint-500/special_tokens_map.json +2 -2
checkpoint-500/tokenizer.json +2 -2
checkpoint-500/tokenizer_config.json +2 -2
checkpoint-500/trainer_state.json +101 -101
checkpoint-500/training_args.bin +1 -1
checkpoint-675/README.md +1 -1
checkpoint-675/adapter_config.json +4 -4
checkpoint-675/adapter_model.safetensors +2 -2
checkpoint-675/chat_template.jinja +93 -0
checkpoint-675/optimizer.pt +2 -2
checkpoint-675/rng_state.pth +1 -1
checkpoint-675/special_tokens_map.json +2 -2
checkpoint-675/tokenizer.json +2 -2
checkpoint-675/tokenizer_config.json +2 -2
checkpoint-675/trainer_state.json +135 -135
checkpoint-675/training_args.bin +1 -1
runs/Jun08_08-35-32_1af0c0439b63/events.out.tfevents.1749371737.1af0c0439b63.503.0 +3 -0

README.md CHANGED Viewed

@@ -1,55 +1,202 @@
 ---
 library_name: peft
-license: llama3.2
-base_model: meta-llama/Llama-3.2-1B
-tags:
-- generated_from_trainer
-model-index:
-- name: llama-finetuned
-  results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# llama-finetuned
-This model is a fine-tuned version of [meta-llama/Llama-3.2-1B](https://huggingface.co/meta-llama/Llama-3.2-1B) on the None dataset.
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 5e-05
-- train_batch_size: 2
-- eval_batch_size: 2
-- seed: 42
-- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
-- lr_scheduler_type: linear
-- num_epochs: 5
-- mixed_precision_training: Native AMP
-### Training results
 ### Framework versions
-- PEFT 0.15.2
-- Transformers 4.52.4
-- Pytorch 2.6.0+cu124
-- Datasets 2.14.4
-- Tokenizers 0.21.1

 ---
+base_model: meta-llama/Llama-3.2-3B-Instruct
 library_name: peft
 ---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
 ### Framework versions
+- PEFT 0.15.2

adapter_config.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "alpha_pattern": {},
   "auto_mapping": null,
-  "base_model_name_or_path": "meta-llama/Llama-3.2-1B",
   "bias": "none",
   "corda_config": null,
   "eva_config": null,
@@ -24,10 +24,10 @@
   "rank_pattern": {},
   "revision": null,
   "target_modules": [
-    "v_proj",
-    "q_proj",
     "k_proj",
-    "o_proj"
   ],
   "task_type": "CAUSAL_LM",
   "trainable_token_indices": null,

 {
   "alpha_pattern": {},
   "auto_mapping": null,
+  "base_model_name_or_path": "meta-llama/Llama-3.2-3B-Instruct",
   "bias": "none",
   "corda_config": null,
   "eva_config": null,
   "rank_pattern": {},
   "revision": null,
   "target_modules": [
+    "o_proj",
     "k_proj",
+    "v_proj",
+    "q_proj"
   ],
   "task_type": "CAUSAL_LM",
   "trainable_token_indices": null,

adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:85a94d974409456e3c95935ba3868ea2a1ce6587e7ca88a8214846c9ee0130dd
-size 6832520

 version https://git-lfs.github.com/spec/v1
+oid sha256:cc7ea701768cfbdaa71d079b215f5f549a6f19783aa970133015c0bcd11942a9
+size 18379784

checkpoint-500/README.md CHANGED Viewed

@@ -1,5 +1,5 @@
 ---
-base_model: meta-llama/Llama-3.2-1B
 library_name: peft
 ---

 ---
+base_model: meta-llama/Llama-3.2-3B-Instruct
 library_name: peft
 ---

checkpoint-500/adapter_config.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "alpha_pattern": {},
   "auto_mapping": null,
-  "base_model_name_or_path": "meta-llama/Llama-3.2-1B",
   "bias": "none",
   "corda_config": null,
   "eva_config": null,
@@ -24,10 +24,10 @@
   "rank_pattern": {},
   "revision": null,
   "target_modules": [
-    "v_proj",
-    "q_proj",
     "k_proj",
-    "o_proj"
   ],
   "task_type": "CAUSAL_LM",
   "trainable_token_indices": null,

 {
   "alpha_pattern": {},
   "auto_mapping": null,
+  "base_model_name_or_path": "meta-llama/Llama-3.2-3B-Instruct",
   "bias": "none",
   "corda_config": null,
   "eva_config": null,
   "rank_pattern": {},
   "revision": null,
   "target_modules": [
+    "o_proj",
     "k_proj",
+    "v_proj",
+    "q_proj"
   ],
   "task_type": "CAUSAL_LM",
   "trainable_token_indices": null,

checkpoint-500/adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:51dd1b6188ef2fbfbd1f1930c34693669e7f1411dc718ebc28dee1741bb7c994
-size 6832520

 version https://git-lfs.github.com/spec/v1
+oid sha256:8b8259114c246ec9d021c0dc592a86039fa551ea32044b58899cb2e13eac109f
+size 18379784

checkpoint-500/chat_template.jinja ADDED Viewed

	@@ -0,0 +1,93 @@

+{{- bos_token }}
+{%- if custom_tools is defined %}
+    {%- set tools = custom_tools %}
+{%- endif %}
+{%- if not tools_in_user_message is defined %}
+    {%- set tools_in_user_message = true %}
+{%- endif %}
+{%- if not date_string is defined %}
+    {%- if strftime_now is defined %}
+        {%- set date_string = strftime_now("%d %b %Y") %}
+    {%- else %}
+        {%- set date_string = "26 Jul 2024" %}
+    {%- endif %}
+{%- endif %}
+{%- if not tools is defined %}
+    {%- set tools = none %}
+{%- endif %}
+{#- This block extracts the system message, so we can slot it into the right place. #}
+{%- if messages[0]['role'] == 'system' %}
+    {%- set system_message = messages[0]['content']|trim %}
+    {%- set messages = messages[1:] %}
+{%- else %}
+    {%- set system_message = "" %}
+{%- endif %}
+{#- System message #}
+{{- "<|start_header_id|>system<|end_header_id|>\n\n" }}
+{%- if tools is not none %}
+    {{- "Environment: ipython\n" }}
+{%- endif %}
+{{- "Cutting Knowledge Date: December 2023\n" }}
+{{- "Today Date: " + date_string + "\n\n" }}
+{%- if tools is not none and not tools_in_user_message %}
+    {{- "You have access to the following functions. To call a function, please respond with JSON for a function call." }}
+    {{- 'Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}.' }}
+    {{- "Do not use variables.\n\n" }}
+    {%- for t in tools %}
+        {{- t | tojson(indent=4) }}
+        {{- "\n\n" }}
+    {%- endfor %}
+{%- endif %}
+{{- system_message }}
+{{- "<|eot_id|>" }}
+{#- Custom tools are passed in a user message with some extra guidance #}
+{%- if tools_in_user_message and not tools is none %}
+    {#- Extract the first user message so we can plug it in here #}
+    {%- if messages | length != 0 %}
+        {%- set first_user_message = messages[0]['content']|trim %}
+        {%- set messages = messages[1:] %}
+    {%- else %}
+        {{- raise_exception("Cannot put tools in the first user message when there's no first user message!") }}
+{%- endif %}
+    {{- '<|start_header_id|>user<|end_header_id|>\n\n' -}}
+    {{- "Given the following functions, please respond with a JSON for a function call " }}
+    {{- "with its proper arguments that best answers the given prompt.\n\n" }}
+    {{- 'Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}.' }}
+    {{- "Do not use variables.\n\n" }}
+    {%- for t in tools %}
+        {{- t | tojson(indent=4) }}
+        {{- "\n\n" }}
+    {%- endfor %}
+    {{- first_user_message + "<|eot_id|>"}}
+{%- endif %}
+{%- for message in messages %}
+    {%- if not (message.role == 'ipython' or message.role == 'tool' or 'tool_calls' in message) %}
+        {{- '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' }}
+    {%- elif 'tool_calls' in message %}
+        {%- if not message.tool_calls|length == 1 %}
+            {{- raise_exception("This model only supports single tool-calls at once!") }}
+        {%- endif %}
+        {%- set tool_call = message.tool_calls[0].function %}
+        {{- '<|start_header_id|>assistant<|end_header_id|>\n\n' -}}
+        {{- '{"name": "' + tool_call.name + '", ' }}
+        {{- '"parameters": ' }}
+        {{- tool_call.arguments | tojson }}
+        {{- "}" }}
+        {{- "<|eot_id|>" }}
+    {%- elif message.role == "tool" or message.role == "ipython" %}
+        {{- "<|start_header_id|>ipython<|end_header_id|>\n\n" }}
+        {%- if message.content is mapping or message.content is iterable %}
+            {{- message.content | tojson }}
+        {%- else %}
+            {{- message.content }}
+        {%- endif %}
+        {{- "<|eot_id|>" }}
+    {%- endif %}
+{%- endfor %}
+{%- if add_generation_prompt %}
+    {{- '<|start_header_id|>assistant<|end_header_id|>\n\n' }}
+{%- endif %}

checkpoint-500/optimizer.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:38c9a3232ca6a60b67a25870c57d11d46276154b2d6b15b197b226df5fe7baa7
-size 13739130

 version https://git-lfs.github.com/spec/v1
+oid sha256:3317afcdcf3af4f5f4ac06a49cdec7ac76a81a4e0df552be3350fb47836d8723
+size 36888186

checkpoint-500/rng_state.pth CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:b5510d5675ca4cc88865f7333ed93c57de3a9c0ab6785b70b748e89d23dc14f3
 size 14244

 version https://git-lfs.github.com/spec/v1
+oid sha256:766e7a0607dfbb8fc62a3e9fdbd70a306ec1cbc12acb22a2ca3051403cb0f501
 size 14244

checkpoint-500/special_tokens_map.json CHANGED Viewed

@@ -7,11 +7,11 @@
     "single_word": false
   },
   "eos_token": {
-    "content": "<|end_of_text|>",
     "lstrip": false,
     "normalized": false,
     "rstrip": false,
     "single_word": false
   },
-  "pad_token": "<|end_of_text|>"
 }

     "single_word": false
   },
   "eos_token": {
+    "content": "<|eot_id|>",
     "lstrip": false,
     "normalized": false,
     "rstrip": false,
     "single_word": false
   },
+  "pad_token": "<|eot_id|>"
 }

checkpoint-500/tokenizer.json CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:a9d4fd2d4afa82d8a7dadae3490fdc20b26f06e32cec78a8dc96521b4dc79038
-size 17210200

 version https://git-lfs.github.com/spec/v1
+oid sha256:c70650b4236027dc8db4abca6b918783a8ed2ee38cd69142f6dbbeb5945f876f
+size 17210195

checkpoint-500/tokenizer_config.json CHANGED Viewed

@@ -2051,13 +2051,13 @@
   },
   "bos_token": "<|begin_of_text|>",
   "clean_up_tokenization_spaces": true,
-  "eos_token": "<|end_of_text|>",
   "extra_special_tokens": {},
   "model_input_names": [
     "input_ids",
     "attention_mask"
   ],
   "model_max_length": 131072,
-  "pad_token": "<|end_of_text|>",
   "tokenizer_class": "PreTrainedTokenizer"
 }

   },
   "bos_token": "<|begin_of_text|>",
   "clean_up_tokenization_spaces": true,
+  "eos_token": "<|eot_id|>",
   "extra_special_tokens": {},
   "model_input_names": [
     "input_ids",
     "attention_mask"
   ],
   "model_max_length": 131072,
+  "pad_token": "<|eot_id|>",
   "tokenizer_class": "PreTrainedTokenizer"
 }

checkpoint-500/trainer_state.json CHANGED Viewed

@@ -11,352 +11,352 @@
   "log_history": [
     {
       "epoch": 0.07407407407407407,
-      "grad_norm": 1.9176784753799438,
       "learning_rate": 4.933333333333334e-05,
-      "loss": 3.2618,
       "step": 10
     },
     {
       "epoch": 0.14814814814814814,
-      "grad_norm": 2.166156053543091,
       "learning_rate": 4.8592592592592596e-05,
-      "loss": 2.9008,
       "step": 20
     },
     {
       "epoch": 0.2222222222222222,
-      "grad_norm": 3.130929470062256,
       "learning_rate": 4.7851851851851854e-05,
-      "loss": 2.4903,
       "step": 30
     },
     {
       "epoch": 0.2962962962962963,
-      "grad_norm": 3.0643246173858643,
       "learning_rate": 4.711111111111111e-05,
-      "loss": 2.1276,
       "step": 40
     },
     {
       "epoch": 0.37037037037037035,
-      "grad_norm": 3.166916847229004,
       "learning_rate": 4.637037037037038e-05,
-      "loss": 1.9145,
       "step": 50
     },
     {
       "epoch": 0.4444444444444444,
-      "grad_norm": 3.496783494949341,
       "learning_rate": 4.5629629629629636e-05,
-      "loss": 1.7413,
       "step": 60
     },
     {
       "epoch": 0.5185185185185185,
-      "grad_norm": 2.274343490600586,
       "learning_rate": 4.4888888888888894e-05,
-      "loss": 1.5862,
       "step": 70
     },
     {
       "epoch": 0.5925925925925926,
-      "grad_norm": 2.1317834854125977,
       "learning_rate": 4.414814814814815e-05,
-      "loss": 1.5853,
       "step": 80
     },
     {
       "epoch": 0.6666666666666666,
-      "grad_norm": 2.5336570739746094,
       "learning_rate": 4.340740740740741e-05,
-      "loss": 1.4492,
       "step": 90
     },
     {
       "epoch": 0.7407407407407407,
-      "grad_norm": 2.5489163398742676,
       "learning_rate": 4.266666666666667e-05,
-      "loss": 1.5561,
       "step": 100
     },
     {
       "epoch": 0.8148148148148148,
-      "grad_norm": 2.276472568511963,
       "learning_rate": 4.192592592592593e-05,
-      "loss": 1.376,
       "step": 110
     },
     {
       "epoch": 0.8888888888888888,
-      "grad_norm": 2.9320948123931885,
       "learning_rate": 4.1185185185185186e-05,
-      "loss": 1.5952,
       "step": 120
     },
     {
       "epoch": 0.9629629629629629,
-      "grad_norm": 2.639327049255371,
       "learning_rate": 4.0444444444444444e-05,
-      "loss": 1.567,
       "step": 130
     },
     {
       "epoch": 1.037037037037037,
-      "grad_norm": 2.1957807540893555,
       "learning_rate": 3.97037037037037e-05,
-      "loss": 1.4817,
       "step": 140
     },
     {
       "epoch": 1.1111111111111112,
-      "grad_norm": 2.7867722511291504,
       "learning_rate": 3.896296296296296e-05,
-      "loss": 1.4706,
       "step": 150
     },
     {
       "epoch": 1.1851851851851851,
-      "grad_norm": 3.132254123687744,
       "learning_rate": 3.8222222222222226e-05,
-      "loss": 1.5353,
       "step": 160
     },
     {
       "epoch": 1.2592592592592593,
-      "grad_norm": 2.851921319961548,
       "learning_rate": 3.7481481481481484e-05,
-      "loss": 1.4258,
       "step": 170
     },
     {
       "epoch": 1.3333333333333333,
-      "grad_norm": 2.733062505722046,
       "learning_rate": 3.674074074074074e-05,
-      "loss": 1.4788,
       "step": 180
     },
     {
       "epoch": 1.4074074074074074,
-      "grad_norm": 2.58499813079834,
       "learning_rate": 3.6e-05,
-      "loss": 1.3938,
       "step": 190
     },
     {
       "epoch": 1.4814814814814814,
-      "grad_norm": 2.7078592777252197,
       "learning_rate": 3.525925925925926e-05,
-      "loss": 1.4244,
       "step": 200
     },
     {
       "epoch": 1.5555555555555556,
-      "grad_norm": 2.7007601261138916,
       "learning_rate": 3.4518518518518524e-05,
-      "loss": 1.575,
       "step": 210
     },
     {
       "epoch": 1.6296296296296298,
-      "grad_norm": 2.4323105812072754,
       "learning_rate": 3.377777777777778e-05,
-      "loss": 1.6532,
       "step": 220
     },
     {
       "epoch": 1.7037037037037037,
-      "grad_norm": 2.4938671588897705,
       "learning_rate": 3.303703703703704e-05,
-      "loss": 1.5417,
       "step": 230
     },
     {
       "epoch": 1.7777777777777777,
-      "grad_norm": 2.872101068496704,
       "learning_rate": 3.22962962962963e-05,
-      "loss": 1.3412,
       "step": 240
     },
     {
       "epoch": 1.8518518518518519,
-      "grad_norm": 3.255509614944458,
       "learning_rate": 3.155555555555556e-05,
-      "loss": 1.4949,
       "step": 250
     },
     {
       "epoch": 1.925925925925926,
-      "grad_norm": 3.0668418407440186,
       "learning_rate": 3.0814814814814816e-05,
-      "loss": 1.4992,
       "step": 260
     },
     {
       "epoch": 2.0,
-      "grad_norm": 3.030184745788574,
       "learning_rate": 3.0074074074074078e-05,
-      "loss": 1.5028,
       "step": 270
     },
     {
       "epoch": 2.074074074074074,
-      "grad_norm": 3.6970374584198,
       "learning_rate": 2.9333333333333336e-05,
-      "loss": 1.4106,
       "step": 280
     },
     {
       "epoch": 2.148148148148148,
-      "grad_norm": 4.04591178894043,
       "learning_rate": 2.8592592592592594e-05,
-      "loss": 1.3073,
       "step": 290
     },
     {
       "epoch": 2.2222222222222223,
-      "grad_norm": 3.198578357696533,
       "learning_rate": 2.7851851851851853e-05,
-      "loss": 1.4697,
       "step": 300
     },
     {
       "epoch": 2.2962962962962963,
-      "grad_norm": 2.752206802368164,
       "learning_rate": 2.7111111111111114e-05,
-      "loss": 1.5942,
       "step": 310
     },
     {
       "epoch": 2.3703703703703702,
-      "grad_norm": 2.6222379207611084,
       "learning_rate": 2.6370370370370373e-05,
-      "loss": 1.529,
       "step": 320
     },
     {
       "epoch": 2.4444444444444446,
-      "grad_norm": 3.0837435722351074,
       "learning_rate": 2.562962962962963e-05,
-      "loss": 1.3467,
       "step": 330
     },
     {
       "epoch": 2.5185185185185186,
-      "grad_norm": 3.7321062088012695,
       "learning_rate": 2.488888888888889e-05,
-      "loss": 1.4712,
       "step": 340
     },
     {
       "epoch": 2.5925925925925926,
-      "grad_norm": 3.4725160598754883,
       "learning_rate": 2.414814814814815e-05,
-      "loss": 1.366,
       "step": 350
     },
     {
       "epoch": 2.6666666666666665,
-      "grad_norm": 3.5917716026306152,
       "learning_rate": 2.340740740740741e-05,
-      "loss": 1.4706,
       "step": 360
     },
     {
       "epoch": 2.7407407407407405,
-      "grad_norm": 2.643585205078125,
       "learning_rate": 2.2666666666666668e-05,
-      "loss": 1.4946,
       "step": 370
     },
     {
       "epoch": 2.814814814814815,
-      "grad_norm": 4.659608364105225,
       "learning_rate": 2.1925925925925926e-05,
-      "loss": 1.4062,
       "step": 380
     },
     {
       "epoch": 2.888888888888889,
-      "grad_norm": 3.32312273979187,
       "learning_rate": 2.1185185185185184e-05,
-      "loss": 1.4974,
       "step": 390
     },
     {
       "epoch": 2.962962962962963,
-      "grad_norm": 2.8320910930633545,
       "learning_rate": 2.0444444444444446e-05,
-      "loss": 1.3854,
       "step": 400
     },
     {
       "epoch": 3.037037037037037,
-      "grad_norm": 2.9114246368408203,
       "learning_rate": 1.9703703703703704e-05,
-      "loss": 1.4369,
       "step": 410
     },
     {
       "epoch": 3.111111111111111,
-      "grad_norm": 3.240769147872925,
       "learning_rate": 1.8962962962962963e-05,
-      "loss": 1.343,
       "step": 420
     },
     {
       "epoch": 3.185185185185185,
-      "grad_norm": 3.537137985229492,
       "learning_rate": 1.8222222222222224e-05,
-      "loss": 1.3801,
       "step": 430
     },
     {
       "epoch": 3.259259259259259,
-      "grad_norm": 3.054455518722534,
       "learning_rate": 1.7481481481481483e-05,
-      "loss": 1.4533,
       "step": 440
     },
     {
       "epoch": 3.3333333333333335,
-      "grad_norm": 4.251873016357422,
       "learning_rate": 1.674074074074074e-05,
-      "loss": 1.4991,
       "step": 450
     },
     {
       "epoch": 3.4074074074074074,
-      "grad_norm": 2.9473700523376465,
       "learning_rate": 1.6000000000000003e-05,
-      "loss": 1.3662,
       "step": 460
     },
     {
       "epoch": 3.4814814814814814,
-      "grad_norm": 3.284587860107422,
       "learning_rate": 1.5259259259259258e-05,
-      "loss": 1.3832,
       "step": 470
     },
     {
       "epoch": 3.5555555555555554,
-      "grad_norm": 3.0811917781829834,
       "learning_rate": 1.4518518518518521e-05,
-      "loss": 1.4724,
       "step": 480
     },
     {
       "epoch": 3.6296296296296298,
-      "grad_norm": 2.595721960067749,
       "learning_rate": 1.3777777777777778e-05,
-      "loss": 1.3091,
       "step": 490
     },
     {
       "epoch": 3.7037037037037037,
-      "grad_norm": 3.941594123840332,
       "learning_rate": 1.3037037037037036e-05,
-      "loss": 1.371,
       "step": 500
     }
   ],
@@ -377,7 +377,7 @@
       "attributes": {}
     }
   },
-  "total_flos": 2994739347456000.0,
   "train_batch_size": 2,
   "trial_name": null,
   "trial_params": null

   "log_history": [
     {
       "epoch": 0.07407407407407407,
+      "grad_norm": 1.6033002138137817,
       "learning_rate": 4.933333333333334e-05,
+      "loss": 3.0632,
       "step": 10
     },
     {
       "epoch": 0.14814814814814814,
+      "grad_norm": 1.6290810108184814,
       "learning_rate": 4.8592592592592596e-05,
+      "loss": 2.7368,
       "step": 20
     },
     {
       "epoch": 0.2222222222222222,
+      "grad_norm": 2.151897430419922,
       "learning_rate": 4.7851851851851854e-05,
+      "loss": 2.3871,
       "step": 30
     },
     {
       "epoch": 0.2962962962962963,
+      "grad_norm": 1.8000338077545166,
       "learning_rate": 4.711111111111111e-05,
+      "loss": 1.9196,
       "step": 40
     },
     {
       "epoch": 0.37037037037037035,
+      "grad_norm": 1.8977078199386597,
       "learning_rate": 4.637037037037038e-05,
+      "loss": 1.3918,
       "step": 50
     },
     {
       "epoch": 0.4444444444444444,
+      "grad_norm": 1.895778775215149,
       "learning_rate": 4.5629629629629636e-05,
+      "loss": 1.5302,
       "step": 60
     },
     {
       "epoch": 0.5185185185185185,
+      "grad_norm": 2.118054151535034,
       "learning_rate": 4.4888888888888894e-05,
+      "loss": 1.5947,
       "step": 70
     },
     {
       "epoch": 0.5925925925925926,
+      "grad_norm": 1.488535761833191,
       "learning_rate": 4.414814814814815e-05,
+      "loss": 1.719,
       "step": 80
     },
     {
       "epoch": 0.6666666666666666,
+      "grad_norm": 1.6291440725326538,
       "learning_rate": 4.340740740740741e-05,
+      "loss": 1.5577,
       "step": 90
     },
     {
       "epoch": 0.7407407407407407,
+      "grad_norm": 1.8335853815078735,
       "learning_rate": 4.266666666666667e-05,
+      "loss": 1.5306,
       "step": 100
     },
     {
       "epoch": 0.8148148148148148,
+      "grad_norm": 1.6403965950012207,
       "learning_rate": 4.192592592592593e-05,
+      "loss": 1.4556,
       "step": 110
     },
     {
       "epoch": 0.8888888888888888,
+      "grad_norm": 2.472151279449463,
       "learning_rate": 4.1185185185185186e-05,
+      "loss": 1.3542,
       "step": 120
     },
     {
       "epoch": 0.9629629629629629,
+      "grad_norm": 2.03757643699646,
       "learning_rate": 4.0444444444444444e-05,
+      "loss": 1.4318,
       "step": 130
     },
     {
       "epoch": 1.037037037037037,
+      "grad_norm": 1.8082479238510132,
       "learning_rate": 3.97037037037037e-05,
+      "loss": 1.325,
       "step": 140
     },
     {
       "epoch": 1.1111111111111112,
+      "grad_norm": 1.9503273963928223,
       "learning_rate": 3.896296296296296e-05,
+      "loss": 1.4448,
       "step": 150
     },
     {
       "epoch": 1.1851851851851851,
+      "grad_norm": 1.9627147912979126,
       "learning_rate": 3.8222222222222226e-05,
+      "loss": 1.5878,
       "step": 160
     },
     {
       "epoch": 1.2592592592592593,
+      "grad_norm": 1.7511639595031738,
       "learning_rate": 3.7481481481481484e-05,
+      "loss": 1.3959,
       "step": 170
     },
     {
       "epoch": 1.3333333333333333,
+      "grad_norm": 2.0530567169189453,
       "learning_rate": 3.674074074074074e-05,
+      "loss": 1.3011,
       "step": 180
     },
     {
       "epoch": 1.4074074074074074,
+      "grad_norm": 2.0430173873901367,
       "learning_rate": 3.6e-05,
+      "loss": 1.4573,
       "step": 190
     },
     {
       "epoch": 1.4814814814814814,
+      "grad_norm": 2.0357518196105957,
       "learning_rate": 3.525925925925926e-05,
+      "loss": 1.1328,
       "step": 200
     },
     {
       "epoch": 1.5555555555555556,
+      "grad_norm": 1.7147893905639648,
       "learning_rate": 3.4518518518518524e-05,
+      "loss": 1.4789,
       "step": 210
     },
     {
       "epoch": 1.6296296296296298,
+      "grad_norm": 2.4516425132751465,
       "learning_rate": 3.377777777777778e-05,
+      "loss": 1.5025,
       "step": 220
     },
     {
       "epoch": 1.7037037037037037,
+      "grad_norm": 1.9009228944778442,
       "learning_rate": 3.303703703703704e-05,
+      "loss": 1.4632,
       "step": 230
     },
     {
       "epoch": 1.7777777777777777,
+      "grad_norm": 2.4635581970214844,
       "learning_rate": 3.22962962962963e-05,
+      "loss": 1.3317,
       "step": 240
     },
     {
       "epoch": 1.8518518518518519,
+      "grad_norm": 2.166893243789673,
       "learning_rate": 3.155555555555556e-05,
+      "loss": 1.4509,
       "step": 250
     },
     {
       "epoch": 1.925925925925926,
+      "grad_norm": 2.0209872722625732,
       "learning_rate": 3.0814814814814816e-05,
+      "loss": 1.3454,
       "step": 260
     },
     {
       "epoch": 2.0,
+      "grad_norm": 2.484250545501709,
       "learning_rate": 3.0074074074074078e-05,
+      "loss": 1.469,
       "step": 270
     },
     {
       "epoch": 2.074074074074074,
+      "grad_norm": 2.2359848022460938,
       "learning_rate": 2.9333333333333336e-05,
+      "loss": 1.4094,
       "step": 280
     },
     {
       "epoch": 2.148148148148148,
+      "grad_norm": 1.8419456481933594,
       "learning_rate": 2.8592592592592594e-05,
+      "loss": 1.3593,
       "step": 290
     },
     {
       "epoch": 2.2222222222222223,
+      "grad_norm": 2.260558605194092,
       "learning_rate": 2.7851851851851853e-05,
+      "loss": 1.2538,
       "step": 300
     },
     {
       "epoch": 2.2962962962962963,
+      "grad_norm": 2.419581890106201,
       "learning_rate": 2.7111111111111114e-05,
+      "loss": 1.3069,
       "step": 310
     },
     {
       "epoch": 2.3703703703703702,
+      "grad_norm": 1.992509126663208,
       "learning_rate": 2.6370370370370373e-05,
+      "loss": 1.3721,
       "step": 320
     },
     {
       "epoch": 2.4444444444444446,
+      "grad_norm": 1.7485105991363525,
       "learning_rate": 2.562962962962963e-05,
+      "loss": 1.4116,
       "step": 330
     },
     {
       "epoch": 2.5185185185185186,
+      "grad_norm": 2.112185478210449,
       "learning_rate": 2.488888888888889e-05,
+      "loss": 1.4882,
       "step": 340
     },
     {
       "epoch": 2.5925925925925926,
+      "grad_norm": 2.6426734924316406,
       "learning_rate": 2.414814814814815e-05,
+      "loss": 1.3898,
       "step": 350
     },
     {
       "epoch": 2.6666666666666665,
+      "grad_norm": 2.420663833618164,
       "learning_rate": 2.340740740740741e-05,
+      "loss": 1.3122,
       "step": 360
     },
     {
       "epoch": 2.7407407407407405,
+      "grad_norm": 2.674475908279419,
       "learning_rate": 2.2666666666666668e-05,
+      "loss": 1.4165,
       "step": 370
     },
     {
       "epoch": 2.814814814814815,
+      "grad_norm": 2.850975275039673,
       "learning_rate": 2.1925925925925926e-05,
+      "loss": 1.3389,
       "step": 380
     },
     {
       "epoch": 2.888888888888889,
+      "grad_norm": 2.469388246536255,
       "learning_rate": 2.1185185185185184e-05,
+      "loss": 1.2991,
       "step": 390
     },
     {
       "epoch": 2.962962962962963,
+      "grad_norm": 2.733851194381714,
       "learning_rate": 2.0444444444444446e-05,
+      "loss": 1.259,
       "step": 400
     },
     {
       "epoch": 3.037037037037037,
+      "grad_norm": 1.964146375656128,
       "learning_rate": 1.9703703703703704e-05,
+      "loss": 1.5459,
       "step": 410
     },
     {
       "epoch": 3.111111111111111,
+      "grad_norm": 2.0667080879211426,
       "learning_rate": 1.8962962962962963e-05,
+      "loss": 1.3024,
       "step": 420
     },
     {
       "epoch": 3.185185185185185,
+      "grad_norm": 2.3768820762634277,
       "learning_rate": 1.8222222222222224e-05,
+      "loss": 1.4858,
       "step": 430
     },
     {
       "epoch": 3.259259259259259,
+      "grad_norm": 3.4706430435180664,
       "learning_rate": 1.7481481481481483e-05,
+      "loss": 1.3467,
       "step": 440
     },
     {
       "epoch": 3.3333333333333335,
+      "grad_norm": 2.3406922817230225,
       "learning_rate": 1.674074074074074e-05,
+      "loss": 1.3619,
       "step": 450
     },
     {
       "epoch": 3.4074074074074074,
+      "grad_norm": 2.3285129070281982,
       "learning_rate": 1.6000000000000003e-05,
+      "loss": 1.4078,
       "step": 460
     },
     {
       "epoch": 3.4814814814814814,
+      "grad_norm": 2.5264031887054443,
       "learning_rate": 1.5259259259259258e-05,
+      "loss": 1.1562,
       "step": 470
     },
     {
       "epoch": 3.5555555555555554,
+      "grad_norm": 2.290501594543457,
       "learning_rate": 1.4518518518518521e-05,
+      "loss": 1.3399,
       "step": 480
     },
     {
       "epoch": 3.6296296296296298,
+      "grad_norm": 3.063209056854248,
       "learning_rate": 1.3777777777777778e-05,
+      "loss": 1.1793,
       "step": 490
     },
     {
       "epoch": 3.7037037037037037,
+      "grad_norm": 2.8260083198547363,
       "learning_rate": 1.3037037037037036e-05,
+      "loss": 1.4168,
       "step": 500
     }
   ],
       "attributes": {}
     }
   },
+  "total_flos": 8673284849664000.0,
   "train_batch_size": 2,
   "trial_name": null,
   "trial_params": null

checkpoint-500/training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:e97160594c45d89ea3fd9c68265c308064022978d7a1e8dc093e9bed35cf1cf7
 size 5304

 version https://git-lfs.github.com/spec/v1
+oid sha256:c06b70c6d22e534aee54e60ea3091f1eeba55994a544d47464b4a805ef2ab30e
 size 5304

checkpoint-675/README.md CHANGED Viewed

@@ -1,5 +1,5 @@
 ---
-base_model: meta-llama/Llama-3.2-1B
 library_name: peft
 ---

 ---
+base_model: meta-llama/Llama-3.2-3B-Instruct
 library_name: peft
 ---

checkpoint-675/adapter_config.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "alpha_pattern": {},
   "auto_mapping": null,
-  "base_model_name_or_path": "meta-llama/Llama-3.2-1B",
   "bias": "none",
   "corda_config": null,
   "eva_config": null,
@@ -24,10 +24,10 @@
   "rank_pattern": {},
   "revision": null,
   "target_modules": [
-    "v_proj",
-    "q_proj",
     "k_proj",
-    "o_proj"
   ],
   "task_type": "CAUSAL_LM",
   "trainable_token_indices": null,

 {
   "alpha_pattern": {},
   "auto_mapping": null,
+  "base_model_name_or_path": "meta-llama/Llama-3.2-3B-Instruct",
   "bias": "none",
   "corda_config": null,
   "eva_config": null,
   "rank_pattern": {},
   "revision": null,
   "target_modules": [
+    "o_proj",
     "k_proj",
+    "v_proj",
+    "q_proj"
   ],
   "task_type": "CAUSAL_LM",
   "trainable_token_indices": null,

checkpoint-675/adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:85a94d974409456e3c95935ba3868ea2a1ce6587e7ca88a8214846c9ee0130dd
-size 6832520

 version https://git-lfs.github.com/spec/v1
+oid sha256:cc7ea701768cfbdaa71d079b215f5f549a6f19783aa970133015c0bcd11942a9
+size 18379784

checkpoint-675/chat_template.jinja ADDED Viewed

	@@ -0,0 +1,93 @@

+{{- bos_token }}
+{%- if custom_tools is defined %}
+    {%- set tools = custom_tools %}
+{%- endif %}
+{%- if not tools_in_user_message is defined %}
+    {%- set tools_in_user_message = true %}
+{%- endif %}
+{%- if not date_string is defined %}
+    {%- if strftime_now is defined %}
+        {%- set date_string = strftime_now("%d %b %Y") %}
+    {%- else %}
+        {%- set date_string = "26 Jul 2024" %}
+    {%- endif %}
+{%- endif %}
+{%- if not tools is defined %}
+    {%- set tools = none %}
+{%- endif %}
+{#- This block extracts the system message, so we can slot it into the right place. #}
+{%- if messages[0]['role'] == 'system' %}
+    {%- set system_message = messages[0]['content']|trim %}
+    {%- set messages = messages[1:] %}
+{%- else %}
+    {%- set system_message = "" %}
+{%- endif %}
+{#- System message #}
+{{- "<|start_header_id|>system<|end_header_id|>\n\n" }}
+{%- if tools is not none %}
+    {{- "Environment: ipython\n" }}
+{%- endif %}
+{{- "Cutting Knowledge Date: December 2023\n" }}
+{{- "Today Date: " + date_string + "\n\n" }}
+{%- if tools is not none and not tools_in_user_message %}
+    {{- "You have access to the following functions. To call a function, please respond with JSON for a function call." }}
+    {{- 'Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}.' }}
+    {{- "Do not use variables.\n\n" }}
+    {%- for t in tools %}
+        {{- t | tojson(indent=4) }}
+        {{- "\n\n" }}
+    {%- endfor %}
+{%- endif %}
+{{- system_message }}
+{{- "<|eot_id|>" }}
+{#- Custom tools are passed in a user message with some extra guidance #}
+{%- if tools_in_user_message and not tools is none %}
+    {#- Extract the first user message so we can plug it in here #}
+    {%- if messages | length != 0 %}
+        {%- set first_user_message = messages[0]['content']|trim %}
+        {%- set messages = messages[1:] %}
+    {%- else %}
+        {{- raise_exception("Cannot put tools in the first user message when there's no first user message!") }}
+{%- endif %}
+    {{- '<|start_header_id|>user<|end_header_id|>\n\n' -}}
+    {{- "Given the following functions, please respond with a JSON for a function call " }}
+    {{- "with its proper arguments that best answers the given prompt.\n\n" }}
+    {{- 'Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}.' }}
+    {{- "Do not use variables.\n\n" }}
+    {%- for t in tools %}
+        {{- t | tojson(indent=4) }}
+        {{- "\n\n" }}
+    {%- endfor %}
+    {{- first_user_message + "<|eot_id|>"}}
+{%- endif %}
+{%- for message in messages %}
+    {%- if not (message.role == 'ipython' or message.role == 'tool' or 'tool_calls' in message) %}
+        {{- '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' }}
+    {%- elif 'tool_calls' in message %}
+        {%- if not message.tool_calls|length == 1 %}
+            {{- raise_exception("This model only supports single tool-calls at once!") }}
+        {%- endif %}
+        {%- set tool_call = message.tool_calls[0].function %}
+        {{- '<|start_header_id|>assistant<|end_header_id|>\n\n' -}}
+        {{- '{"name": "' + tool_call.name + '", ' }}
+        {{- '"parameters": ' }}
+        {{- tool_call.arguments | tojson }}
+        {{- "}" }}
+        {{- "<|eot_id|>" }}
+    {%- elif message.role == "tool" or message.role == "ipython" %}
+        {{- "<|start_header_id|>ipython<|end_header_id|>\n\n" }}
+        {%- if message.content is mapping or message.content is iterable %}
+            {{- message.content | tojson }}
+        {%- else %}
+            {{- message.content }}
+        {%- endif %}
+        {{- "<|eot_id|>" }}
+    {%- endif %}
+{%- endfor %}
+{%- if add_generation_prompt %}
+    {{- '<|start_header_id|>assistant<|end_header_id|>\n\n' }}
+{%- endif %}

checkpoint-675/optimizer.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:e16eb51edd416344ec01ae08161f75e5b39c6a771c90092b2efd6bfbe216820b
-size 13739130

 version https://git-lfs.github.com/spec/v1
+oid sha256:575fbe3b0aeb9ea62a324077fb9c3c2cbe9882ac13f98d5feb1e3150f6354d2b
+size 36888186

checkpoint-675/rng_state.pth CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:6f2ea9c0c0d5c060e3f0c36ca552127cfbc3cb0e8231b97b11065f63c83d513f
 size 14244

 version https://git-lfs.github.com/spec/v1
+oid sha256:11f806a90936250186b7351d45b954a2dacb8a2cb0336a0049a5107f2a56eceb
 size 14244

checkpoint-675/special_tokens_map.json CHANGED Viewed

@@ -7,11 +7,11 @@
     "single_word": false
   },
   "eos_token": {
-    "content": "<|end_of_text|>",
     "lstrip": false,
     "normalized": false,
     "rstrip": false,
     "single_word": false
   },
-  "pad_token": "<|end_of_text|>"
 }

     "single_word": false
   },
   "eos_token": {
+    "content": "<|eot_id|>",
     "lstrip": false,
     "normalized": false,
     "rstrip": false,
     "single_word": false
   },
+  "pad_token": "<|eot_id|>"
 }

checkpoint-675/tokenizer.json CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:a9d4fd2d4afa82d8a7dadae3490fdc20b26f06e32cec78a8dc96521b4dc79038
-size 17210200

 version https://git-lfs.github.com/spec/v1
+oid sha256:c70650b4236027dc8db4abca6b918783a8ed2ee38cd69142f6dbbeb5945f876f
+size 17210195

checkpoint-675/tokenizer_config.json CHANGED Viewed

@@ -2051,13 +2051,13 @@
   },
   "bos_token": "<|begin_of_text|>",
   "clean_up_tokenization_spaces": true,
-  "eos_token": "<|end_of_text|>",
   "extra_special_tokens": {},
   "model_input_names": [
     "input_ids",
     "attention_mask"
   ],
   "model_max_length": 131072,
-  "pad_token": "<|end_of_text|>",
   "tokenizer_class": "PreTrainedTokenizer"
 }

   },
   "bos_token": "<|begin_of_text|>",
   "clean_up_tokenization_spaces": true,
+  "eos_token": "<|eot_id|>",
   "extra_special_tokens": {},
   "model_input_names": [
     "input_ids",
     "attention_mask"
   ],
   "model_max_length": 131072,
+  "pad_token": "<|eot_id|>",
   "tokenizer_class": "PreTrainedTokenizer"
 }

checkpoint-675/trainer_state.json CHANGED Viewed

@@ -11,471 +11,471 @@
   "log_history": [
     {
       "epoch": 0.07407407407407407,
-      "grad_norm": 1.9176784753799438,
       "learning_rate": 4.933333333333334e-05,
-      "loss": 3.2618,
       "step": 10
     },
     {
       "epoch": 0.14814814814814814,
-      "grad_norm": 2.166156053543091,
       "learning_rate": 4.8592592592592596e-05,
-      "loss": 2.9008,
       "step": 20
     },
     {
       "epoch": 0.2222222222222222,
-      "grad_norm": 3.130929470062256,
       "learning_rate": 4.7851851851851854e-05,
-      "loss": 2.4903,
       "step": 30
     },
     {
       "epoch": 0.2962962962962963,
-      "grad_norm": 3.0643246173858643,
       "learning_rate": 4.711111111111111e-05,
-      "loss": 2.1276,
       "step": 40
     },
     {
       "epoch": 0.37037037037037035,
-      "grad_norm": 3.166916847229004,
       "learning_rate": 4.637037037037038e-05,
-      "loss": 1.9145,
       "step": 50
     },
     {
       "epoch": 0.4444444444444444,
-      "grad_norm": 3.496783494949341,
       "learning_rate": 4.5629629629629636e-05,
-      "loss": 1.7413,
       "step": 60
     },
     {
       "epoch": 0.5185185185185185,
-      "grad_norm": 2.274343490600586,
       "learning_rate": 4.4888888888888894e-05,
-      "loss": 1.5862,
       "step": 70
     },
     {
       "epoch": 0.5925925925925926,
-      "grad_norm": 2.1317834854125977,
       "learning_rate": 4.414814814814815e-05,
-      "loss": 1.5853,
       "step": 80
     },
     {
       "epoch": 0.6666666666666666,
-      "grad_norm": 2.5336570739746094,
       "learning_rate": 4.340740740740741e-05,
-      "loss": 1.4492,
       "step": 90
     },
     {
       "epoch": 0.7407407407407407,
-      "grad_norm": 2.5489163398742676,
       "learning_rate": 4.266666666666667e-05,
-      "loss": 1.5561,
       "step": 100
     },
     {
       "epoch": 0.8148148148148148,
-      "grad_norm": 2.276472568511963,
       "learning_rate": 4.192592592592593e-05,
-      "loss": 1.376,
       "step": 110
     },
     {
       "epoch": 0.8888888888888888,
-      "grad_norm": 2.9320948123931885,
       "learning_rate": 4.1185185185185186e-05,
-      "loss": 1.5952,
       "step": 120
     },
     {
       "epoch": 0.9629629629629629,
-      "grad_norm": 2.639327049255371,
       "learning_rate": 4.0444444444444444e-05,
-      "loss": 1.567,
       "step": 130
     },
     {
       "epoch": 1.037037037037037,
-      "grad_norm": 2.1957807540893555,
       "learning_rate": 3.97037037037037e-05,
-      "loss": 1.4817,
       "step": 140
     },
     {
       "epoch": 1.1111111111111112,
-      "grad_norm": 2.7867722511291504,
       "learning_rate": 3.896296296296296e-05,
-      "loss": 1.4706,
       "step": 150
     },
     {
       "epoch": 1.1851851851851851,
-      "grad_norm": 3.132254123687744,
       "learning_rate": 3.8222222222222226e-05,
-      "loss": 1.5353,
       "step": 160
     },
     {
       "epoch": 1.2592592592592593,
-      "grad_norm": 2.851921319961548,
       "learning_rate": 3.7481481481481484e-05,
-      "loss": 1.4258,
       "step": 170
     },
     {
       "epoch": 1.3333333333333333,
-      "grad_norm": 2.733062505722046,
       "learning_rate": 3.674074074074074e-05,
-      "loss": 1.4788,
       "step": 180
     },
     {
       "epoch": 1.4074074074074074,
-      "grad_norm": 2.58499813079834,
       "learning_rate": 3.6e-05,
-      "loss": 1.3938,
       "step": 190
     },
     {
       "epoch": 1.4814814814814814,
-      "grad_norm": 2.7078592777252197,
       "learning_rate": 3.525925925925926e-05,
-      "loss": 1.4244,
       "step": 200
     },
     {
       "epoch": 1.5555555555555556,
-      "grad_norm": 2.7007601261138916,
       "learning_rate": 3.4518518518518524e-05,
-      "loss": 1.575,
       "step": 210
     },
     {
       "epoch": 1.6296296296296298,
-      "grad_norm": 2.4323105812072754,
       "learning_rate": 3.377777777777778e-05,
-      "loss": 1.6532,
       "step": 220
     },
     {
       "epoch": 1.7037037037037037,
-      "grad_norm": 2.4938671588897705,
       "learning_rate": 3.303703703703704e-05,
-      "loss": 1.5417,
       "step": 230
     },
     {
       "epoch": 1.7777777777777777,
-      "grad_norm": 2.872101068496704,
       "learning_rate": 3.22962962962963e-05,
-      "loss": 1.3412,
       "step": 240
     },
     {
       "epoch": 1.8518518518518519,
-      "grad_norm": 3.255509614944458,
       "learning_rate": 3.155555555555556e-05,
-      "loss": 1.4949,
       "step": 250
     },
     {
       "epoch": 1.925925925925926,
-      "grad_norm": 3.0668418407440186,
       "learning_rate": 3.0814814814814816e-05,
-      "loss": 1.4992,
       "step": 260
     },
     {
       "epoch": 2.0,
-      "grad_norm": 3.030184745788574,
       "learning_rate": 3.0074074074074078e-05,
-      "loss": 1.5028,
       "step": 270
     },
     {
       "epoch": 2.074074074074074,
-      "grad_norm": 3.6970374584198,
       "learning_rate": 2.9333333333333336e-05,
-      "loss": 1.4106,
       "step": 280
     },
     {
       "epoch": 2.148148148148148,
-      "grad_norm": 4.04591178894043,
       "learning_rate": 2.8592592592592594e-05,
-      "loss": 1.3073,
       "step": 290
     },
     {
       "epoch": 2.2222222222222223,
-      "grad_norm": 3.198578357696533,
       "learning_rate": 2.7851851851851853e-05,
-      "loss": 1.4697,
       "step": 300
     },
     {
       "epoch": 2.2962962962962963,
-      "grad_norm": 2.752206802368164,
       "learning_rate": 2.7111111111111114e-05,
-      "loss": 1.5942,
       "step": 310
     },
     {
       "epoch": 2.3703703703703702,
-      "grad_norm": 2.6222379207611084,
       "learning_rate": 2.6370370370370373e-05,
-      "loss": 1.529,
       "step": 320
     },
     {
       "epoch": 2.4444444444444446,
-      "grad_norm": 3.0837435722351074,
       "learning_rate": 2.562962962962963e-05,
-      "loss": 1.3467,
       "step": 330
     },
     {
       "epoch": 2.5185185185185186,
-      "grad_norm": 3.7321062088012695,
       "learning_rate": 2.488888888888889e-05,
-      "loss": 1.4712,
       "step": 340
     },
     {
       "epoch": 2.5925925925925926,
-      "grad_norm": 3.4725160598754883,
       "learning_rate": 2.414814814814815e-05,
-      "loss": 1.366,
       "step": 350
     },
     {
       "epoch": 2.6666666666666665,
-      "grad_norm": 3.5917716026306152,
       "learning_rate": 2.340740740740741e-05,
-      "loss": 1.4706,
       "step": 360
     },
     {
       "epoch": 2.7407407407407405,
-      "grad_norm": 2.643585205078125,
       "learning_rate": 2.2666666666666668e-05,
-      "loss": 1.4946,
       "step": 370
     },
     {
       "epoch": 2.814814814814815,
-      "grad_norm": 4.659608364105225,
       "learning_rate": 2.1925925925925926e-05,
-      "loss": 1.4062,
       "step": 380
     },
     {
       "epoch": 2.888888888888889,
-      "grad_norm": 3.32312273979187,
       "learning_rate": 2.1185185185185184e-05,
-      "loss": 1.4974,
       "step": 390
     },
     {
       "epoch": 2.962962962962963,
-      "grad_norm": 2.8320910930633545,
       "learning_rate": 2.0444444444444446e-05,
-      "loss": 1.3854,
       "step": 400
     },
     {
       "epoch": 3.037037037037037,
-      "grad_norm": 2.9114246368408203,
       "learning_rate": 1.9703703703703704e-05,
-      "loss": 1.4369,
       "step": 410
     },
     {
       "epoch": 3.111111111111111,
-      "grad_norm": 3.240769147872925,
       "learning_rate": 1.8962962962962963e-05,
-      "loss": 1.343,
       "step": 420
     },
     {
       "epoch": 3.185185185185185,
-      "grad_norm": 3.537137985229492,
       "learning_rate": 1.8222222222222224e-05,
-      "loss": 1.3801,
       "step": 430
     },
     {
       "epoch": 3.259259259259259,
-      "grad_norm": 3.054455518722534,
       "learning_rate": 1.7481481481481483e-05,
-      "loss": 1.4533,
       "step": 440
     },
     {
       "epoch": 3.3333333333333335,
-      "grad_norm": 4.251873016357422,
       "learning_rate": 1.674074074074074e-05,
-      "loss": 1.4991,
       "step": 450
     },
     {
       "epoch": 3.4074074074074074,
-      "grad_norm": 2.9473700523376465,
       "learning_rate": 1.6000000000000003e-05,
-      "loss": 1.3662,
       "step": 460
     },
     {
       "epoch": 3.4814814814814814,
-      "grad_norm": 3.284587860107422,
       "learning_rate": 1.5259259259259258e-05,
-      "loss": 1.3832,
       "step": 470
     },
     {
       "epoch": 3.5555555555555554,
-      "grad_norm": 3.0811917781829834,
       "learning_rate": 1.4518518518518521e-05,
-      "loss": 1.4724,
       "step": 480
     },
     {
       "epoch": 3.6296296296296298,
-      "grad_norm": 2.595721960067749,
       "learning_rate": 1.3777777777777778e-05,
-      "loss": 1.3091,
       "step": 490
     },
     {
       "epoch": 3.7037037037037037,
-      "grad_norm": 3.941594123840332,
       "learning_rate": 1.3037037037037036e-05,
-      "loss": 1.371,
       "step": 500
     },
     {
       "epoch": 3.7777777777777777,
-      "grad_norm": 3.5405843257904053,
       "learning_rate": 1.2296296296296298e-05,
-      "loss": 1.3644,
       "step": 510
     },
     {
       "epoch": 3.851851851851852,
-      "grad_norm": 2.9564130306243896,
       "learning_rate": 1.1555555555555556e-05,
-      "loss": 1.458,
       "step": 520
     },
     {
       "epoch": 3.925925925925926,
-      "grad_norm": 2.8802552223205566,
       "learning_rate": 1.0814814814814814e-05,
-      "loss": 1.4408,
       "step": 530
     },
     {
       "epoch": 4.0,
-      "grad_norm": 3.0077877044677734,
       "learning_rate": 1.0074074074074074e-05,
-      "loss": 1.4497,
       "step": 540
     },
     {
       "epoch": 4.074074074074074,
-      "grad_norm": 3.43784761428833,
       "learning_rate": 9.333333333333334e-06,
-      "loss": 1.4439,
       "step": 550
     },
     {
       "epoch": 4.148148148148148,
-      "grad_norm": 3.5418014526367188,
       "learning_rate": 8.592592592592593e-06,
-      "loss": 1.4832,
       "step": 560
     },
     {
       "epoch": 4.222222222222222,
-      "grad_norm": 3.1893768310546875,
       "learning_rate": 7.851851851851853e-06,
-      "loss": 1.3177,
       "step": 570
     },
     {
       "epoch": 4.296296296296296,
-      "grad_norm": 3.522493839263916,
       "learning_rate": 7.111111111111112e-06,
-      "loss": 1.2925,
       "step": 580
     },
     {
       "epoch": 4.37037037037037,
-      "grad_norm": 3.473977565765381,
       "learning_rate": 6.370370370370371e-06,
-      "loss": 1.4198,
       "step": 590
     },
     {
       "epoch": 4.444444444444445,
-      "grad_norm": 4.043973445892334,
       "learning_rate": 5.62962962962963e-06,
-      "loss": 1.3265,
       "step": 600
     },
     {
       "epoch": 4.518518518518518,
-      "grad_norm": 4.024613857269287,
       "learning_rate": 4.888888888888889e-06,
-      "loss": 1.3528,
       "step": 610
     },
     {
       "epoch": 4.592592592592593,
-      "grad_norm": 3.266040802001953,
       "learning_rate": 4.1481481481481485e-06,
-      "loss": 1.3553,
       "step": 620
     },
     {
       "epoch": 4.666666666666667,
-      "grad_norm": 5.175985813140869,
       "learning_rate": 3.4074074074074077e-06,
-      "loss": 1.332,
       "step": 630
     },
     {
       "epoch": 4.7407407407407405,
-      "grad_norm": 3.355964422225952,
       "learning_rate": 2.666666666666667e-06,
-      "loss": 1.5673,
       "step": 640
     },
     {
       "epoch": 4.814814814814815,
-      "grad_norm": 3.170093297958374,
       "learning_rate": 1.925925925925926e-06,
-      "loss": 1.4978,
       "step": 650
     },
     {
       "epoch": 4.888888888888889,
-      "grad_norm": 3.5066723823547363,
       "learning_rate": 1.1851851851851852e-06,
-      "loss": 1.3474,
       "step": 660
     },
     {
       "epoch": 4.962962962962963,
-      "grad_norm": 3.7922685146331787,
       "learning_rate": 4.444444444444445e-07,
-      "loss": 1.3518,
       "step": 670
     }
   ],
@@ -496,7 +496,7 @@
       "attributes": {}
     }
   },
-  "total_flos": 4042898119065600.0,
   "train_batch_size": 2,
   "trial_name": null,
   "trial_params": null

   "log_history": [
     {
       "epoch": 0.07407407407407407,
+      "grad_norm": 1.6033002138137817,
       "learning_rate": 4.933333333333334e-05,
+      "loss": 3.0632,
       "step": 10
     },
     {
       "epoch": 0.14814814814814814,
+      "grad_norm": 1.6290810108184814,
       "learning_rate": 4.8592592592592596e-05,
+      "loss": 2.7368,
       "step": 20
     },
     {
       "epoch": 0.2222222222222222,
+      "grad_norm": 2.151897430419922,
       "learning_rate": 4.7851851851851854e-05,
+      "loss": 2.3871,
       "step": 30
     },
     {
       "epoch": 0.2962962962962963,
+      "grad_norm": 1.8000338077545166,
       "learning_rate": 4.711111111111111e-05,
+      "loss": 1.9196,
       "step": 40
     },
     {
       "epoch": 0.37037037037037035,
+      "grad_norm": 1.8977078199386597,
       "learning_rate": 4.637037037037038e-05,
+      "loss": 1.3918,
       "step": 50
     },
     {
       "epoch": 0.4444444444444444,
+      "grad_norm": 1.895778775215149,
       "learning_rate": 4.5629629629629636e-05,
+      "loss": 1.5302,
       "step": 60
     },
     {
       "epoch": 0.5185185185185185,
+      "grad_norm": 2.118054151535034,
       "learning_rate": 4.4888888888888894e-05,
+      "loss": 1.5947,
       "step": 70
     },
     {
       "epoch": 0.5925925925925926,
+      "grad_norm": 1.488535761833191,
       "learning_rate": 4.414814814814815e-05,
+      "loss": 1.719,
       "step": 80
     },
     {
       "epoch": 0.6666666666666666,
+      "grad_norm": 1.6291440725326538,
       "learning_rate": 4.340740740740741e-05,
+      "loss": 1.5577,
       "step": 90
     },
     {
       "epoch": 0.7407407407407407,
+      "grad_norm": 1.8335853815078735,
       "learning_rate": 4.266666666666667e-05,
+      "loss": 1.5306,
       "step": 100
     },
     {
       "epoch": 0.8148148148148148,
+      "grad_norm": 1.6403965950012207,
       "learning_rate": 4.192592592592593e-05,
+      "loss": 1.4556,
       "step": 110
     },
     {
       "epoch": 0.8888888888888888,
+      "grad_norm": 2.472151279449463,
       "learning_rate": 4.1185185185185186e-05,
+      "loss": 1.3542,
       "step": 120
     },
     {
       "epoch": 0.9629629629629629,
+      "grad_norm": 2.03757643699646,
       "learning_rate": 4.0444444444444444e-05,
+      "loss": 1.4318,
       "step": 130
     },
     {
       "epoch": 1.037037037037037,
+      "grad_norm": 1.8082479238510132,
       "learning_rate": 3.97037037037037e-05,
+      "loss": 1.325,
       "step": 140
     },
     {
       "epoch": 1.1111111111111112,
+      "grad_norm": 1.9503273963928223,
       "learning_rate": 3.896296296296296e-05,
+      "loss": 1.4448,
       "step": 150
     },
     {
       "epoch": 1.1851851851851851,
+      "grad_norm": 1.9627147912979126,
       "learning_rate": 3.8222222222222226e-05,
+      "loss": 1.5878,
       "step": 160
     },
     {
       "epoch": 1.2592592592592593,
+      "grad_norm": 1.7511639595031738,
       "learning_rate": 3.7481481481481484e-05,
+      "loss": 1.3959,
       "step": 170
     },
     {
       "epoch": 1.3333333333333333,
+      "grad_norm": 2.0530567169189453,
       "learning_rate": 3.674074074074074e-05,
+      "loss": 1.3011,
       "step": 180
     },
     {
       "epoch": 1.4074074074074074,
+      "grad_norm": 2.0430173873901367,
       "learning_rate": 3.6e-05,
+      "loss": 1.4573,
       "step": 190
     },
     {
       "epoch": 1.4814814814814814,
+      "grad_norm": 2.0357518196105957,
       "learning_rate": 3.525925925925926e-05,
+      "loss": 1.1328,
       "step": 200
     },
     {
       "epoch": 1.5555555555555556,
+      "grad_norm": 1.7147893905639648,
       "learning_rate": 3.4518518518518524e-05,
+      "loss": 1.4789,
       "step": 210
     },
     {
       "epoch": 1.6296296296296298,
+      "grad_norm": 2.4516425132751465,
       "learning_rate": 3.377777777777778e-05,
+      "loss": 1.5025,
       "step": 220
     },
     {
       "epoch": 1.7037037037037037,
+      "grad_norm": 1.9009228944778442,
       "learning_rate": 3.303703703703704e-05,
+      "loss": 1.4632,
       "step": 230
     },
     {
       "epoch": 1.7777777777777777,
+      "grad_norm": 2.4635581970214844,
       "learning_rate": 3.22962962962963e-05,
+      "loss": 1.3317,
       "step": 240
     },
     {
       "epoch": 1.8518518518518519,
+      "grad_norm": 2.166893243789673,
       "learning_rate": 3.155555555555556e-05,
+      "loss": 1.4509,
       "step": 250
     },
     {
       "epoch": 1.925925925925926,
+      "grad_norm": 2.0209872722625732,
       "learning_rate": 3.0814814814814816e-05,
+      "loss": 1.3454,
       "step": 260
     },
     {
       "epoch": 2.0,
+      "grad_norm": 2.484250545501709,
       "learning_rate": 3.0074074074074078e-05,
+      "loss": 1.469,
       "step": 270
     },
     {
       "epoch": 2.074074074074074,
+      "grad_norm": 2.2359848022460938,
       "learning_rate": 2.9333333333333336e-05,
+      "loss": 1.4094,
       "step": 280
     },
     {
       "epoch": 2.148148148148148,
+      "grad_norm": 1.8419456481933594,
       "learning_rate": 2.8592592592592594e-05,
+      "loss": 1.3593,
       "step": 290
     },
     {
       "epoch": 2.2222222222222223,
+      "grad_norm": 2.260558605194092,
       "learning_rate": 2.7851851851851853e-05,
+      "loss": 1.2538,
       "step": 300
     },
     {
       "epoch": 2.2962962962962963,
+      "grad_norm": 2.419581890106201,
       "learning_rate": 2.7111111111111114e-05,
+      "loss": 1.3069,
       "step": 310
     },
     {
       "epoch": 2.3703703703703702,
+      "grad_norm": 1.992509126663208,
       "learning_rate": 2.6370370370370373e-05,
+      "loss": 1.3721,
       "step": 320
     },
     {
       "epoch": 2.4444444444444446,
+      "grad_norm": 1.7485105991363525,
       "learning_rate": 2.562962962962963e-05,
+      "loss": 1.4116,
       "step": 330
     },
     {
       "epoch": 2.5185185185185186,
+      "grad_norm": 2.112185478210449,
       "learning_rate": 2.488888888888889e-05,
+      "loss": 1.4882,
       "step": 340
     },
     {
       "epoch": 2.5925925925925926,
+      "grad_norm": 2.6426734924316406,
       "learning_rate": 2.414814814814815e-05,
+      "loss": 1.3898,
       "step": 350
     },
     {
       "epoch": 2.6666666666666665,
+      "grad_norm": 2.420663833618164,
       "learning_rate": 2.340740740740741e-05,
+      "loss": 1.3122,
       "step": 360
     },
     {
       "epoch": 2.7407407407407405,
+      "grad_norm": 2.674475908279419,
       "learning_rate": 2.2666666666666668e-05,
+      "loss": 1.4165,
       "step": 370
     },
     {
       "epoch": 2.814814814814815,
+      "grad_norm": 2.850975275039673,
       "learning_rate": 2.1925925925925926e-05,
+      "loss": 1.3389,
       "step": 380
     },
     {
       "epoch": 2.888888888888889,
+      "grad_norm": 2.469388246536255,
       "learning_rate": 2.1185185185185184e-05,
+      "loss": 1.2991,
       "step": 390
     },
     {
       "epoch": 2.962962962962963,
+      "grad_norm": 2.733851194381714,
       "learning_rate": 2.0444444444444446e-05,
+      "loss": 1.259,
       "step": 400
     },
     {
       "epoch": 3.037037037037037,
+      "grad_norm": 1.964146375656128,
       "learning_rate": 1.9703703703703704e-05,
+      "loss": 1.5459,
       "step": 410
     },
     {
       "epoch": 3.111111111111111,
+      "grad_norm": 2.0667080879211426,
       "learning_rate": 1.8962962962962963e-05,
+      "loss": 1.3024,
       "step": 420
     },
     {
       "epoch": 3.185185185185185,
+      "grad_norm": 2.3768820762634277,
       "learning_rate": 1.8222222222222224e-05,
+      "loss": 1.4858,
       "step": 430
     },
     {
       "epoch": 3.259259259259259,
+      "grad_norm": 3.4706430435180664,
       "learning_rate": 1.7481481481481483e-05,
+      "loss": 1.3467,
       "step": 440
     },
     {
       "epoch": 3.3333333333333335,
+      "grad_norm": 2.3406922817230225,
       "learning_rate": 1.674074074074074e-05,
+      "loss": 1.3619,
       "step": 450
     },
     {
       "epoch": 3.4074074074074074,
+      "grad_norm": 2.3285129070281982,
       "learning_rate": 1.6000000000000003e-05,
+      "loss": 1.4078,
       "step": 460
     },
     {
       "epoch": 3.4814814814814814,
+      "grad_norm": 2.5264031887054443,
       "learning_rate": 1.5259259259259258e-05,
+      "loss": 1.1562,
       "step": 470
     },
     {
       "epoch": 3.5555555555555554,
+      "grad_norm": 2.290501594543457,
       "learning_rate": 1.4518518518518521e-05,
+      "loss": 1.3399,
       "step": 480
     },
     {
       "epoch": 3.6296296296296298,
+      "grad_norm": 3.063209056854248,
       "learning_rate": 1.3777777777777778e-05,
+      "loss": 1.1793,
       "step": 490
     },
     {
       "epoch": 3.7037037037037037,
+      "grad_norm": 2.8260083198547363,
       "learning_rate": 1.3037037037037036e-05,
+      "loss": 1.4168,
       "step": 500
     },
     {
       "epoch": 3.7777777777777777,
+      "grad_norm": 2.5373244285583496,
       "learning_rate": 1.2296296296296298e-05,
+      "loss": 1.2364,
       "step": 510
     },
     {
       "epoch": 3.851851851851852,
+      "grad_norm": 2.6455769538879395,
       "learning_rate": 1.1555555555555556e-05,
+      "loss": 1.278,
       "step": 520
     },
     {
       "epoch": 3.925925925925926,
+      "grad_norm": 2.7349331378936768,
       "learning_rate": 1.0814814814814814e-05,
+      "loss": 1.3113,
       "step": 530
     },
     {
       "epoch": 4.0,
+      "grad_norm": 2.5275073051452637,
       "learning_rate": 1.0074074074074074e-05,
+      "loss": 1.2358,
       "step": 540
     },
     {
       "epoch": 4.074074074074074,
+      "grad_norm": 2.648723840713501,
       "learning_rate": 9.333333333333334e-06,
+      "loss": 1.3655,
       "step": 550
     },
     {
       "epoch": 4.148148148148148,
+      "grad_norm": 2.6446926593780518,
       "learning_rate": 8.592592592592593e-06,
+      "loss": 1.397,
       "step": 560
     },
     {
       "epoch": 4.222222222222222,
+      "grad_norm": 2.8394277095794678,
       "learning_rate": 7.851851851851853e-06,
+      "loss": 1.2936,
       "step": 570
     },
     {
       "epoch": 4.296296296296296,
+      "grad_norm": 2.7919442653656006,
       "learning_rate": 7.111111111111112e-06,
+      "loss": 1.207,
       "step": 580
     },
     {
       "epoch": 4.37037037037037,
+      "grad_norm": 2.8117082118988037,
       "learning_rate": 6.370370370370371e-06,
+      "loss": 1.1337,
       "step": 590
     },
     {
       "epoch": 4.444444444444445,
+      "grad_norm": 3.2036752700805664,
       "learning_rate": 5.62962962962963e-06,
+      "loss": 1.4294,
       "step": 600
     },
     {
       "epoch": 4.518518518518518,
+      "grad_norm": 2.448761224746704,
       "learning_rate": 4.888888888888889e-06,
+      "loss": 1.2775,
       "step": 610
     },
     {
       "epoch": 4.592592592592593,
+      "grad_norm": 2.883207082748413,
       "learning_rate": 4.1481481481481485e-06,
+      "loss": 1.2913,
       "step": 620
     },
     {
       "epoch": 4.666666666666667,
+      "grad_norm": 3.2061944007873535,
       "learning_rate": 3.4074074074074077e-06,
+      "loss": 1.3502,
       "step": 630
     },
     {
       "epoch": 4.7407407407407405,
+      "grad_norm": 2.664846181869507,
       "learning_rate": 2.666666666666667e-06,
+      "loss": 1.3651,
       "step": 640
     },
     {
       "epoch": 4.814814814814815,
+      "grad_norm": 2.967418909072876,
       "learning_rate": 1.925925925925926e-06,
+      "loss": 1.2735,
       "step": 650
     },
     {
       "epoch": 4.888888888888889,
+      "grad_norm": 2.67146372795105,
       "learning_rate": 1.1851851851851852e-06,
+      "loss": 1.2634,
       "step": 660
     },
     {
       "epoch": 4.962962962962963,
+      "grad_norm": 2.5436322689056396,
       "learning_rate": 4.444444444444445e-07,
+      "loss": 1.2519,
       "step": 670
     }
   ],
       "attributes": {}
     }
   },
+  "total_flos": 1.17089345470464e+16,
   "train_batch_size": 2,
   "trial_name": null,
   "trial_params": null

checkpoint-675/training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:e97160594c45d89ea3fd9c68265c308064022978d7a1e8dc093e9bed35cf1cf7
 size 5304

 version https://git-lfs.github.com/spec/v1
+oid sha256:c06b70c6d22e534aee54e60ea3091f1eeba55994a544d47464b4a805ef2ab30e
 size 5304

runs/Jun08_08-35-32_1af0c0439b63/events.out.tfevents.1749371737.1af0c0439b63.503.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b1b74493478c978c6c61600c221a675dd0467e6e0561fffd9a706aed1723483c
+size 19595