Uploading model files

Browse files

Files changed (13) hide show

.gitattributes +1 -0
README.md +202 -0
adapter_config.json +31 -0
adapter_model.safetensors +3 -0
added_tokens.json +24 -0
chat_template.jinja +54 -0
merges.txt +0 -0
special_tokens_map.json +31 -0
tokenizer.json +3 -0
tokenizer_config.json +208 -0
trainer_state.json +3694 -0
training_args.bin +3 -0
vocab.json +0 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+tokenizer.json filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: Qwen/Qwen2.5-7B-Instruct
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.15.2

adapter_config.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "Qwen/Qwen2.5-7B-Instruct",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 32,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "o_proj",
+    "down_proj",
+    "gate_up_proj",
+    "qkv_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": true
+}

adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:22e19bbe399dae61169a4496a3473373b77047f8b4176f804b5f3b356bebb760
+size 106445440

added_tokens.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+  "</tool_call>": 151658,
+  "<tool_call>": 151657,
+  "<|box_end|>": 151649,
+  "<|box_start|>": 151648,
+  "<|endoftext|>": 151643,
+  "<|file_sep|>": 151664,
+  "<|fim_middle|>": 151660,
+  "<|fim_pad|>": 151662,
+  "<|fim_prefix|>": 151659,
+  "<|fim_suffix|>": 151661,
+  "<|im_end|>": 151645,
+  "<|im_start|>": 151644,
+  "<|image_pad|>": 151655,
+  "<|object_ref_end|>": 151647,
+  "<|object_ref_start|>": 151646,
+  "<|quad_end|>": 151651,
+  "<|quad_start|>": 151650,
+  "<|repo_name|>": 151663,
+  "<|video_pad|>": 151656,
+  "<|vision_end|>": 151653,
+  "<|vision_pad|>": 151654,
+  "<|vision_start|>": 151652
+}

chat_template.jinja ADDED Viewed

	@@ -0,0 +1,54 @@

+{%- if tools %}
+    {{- '<|im_start|>system\n' }}
+    {%- if messages[0]['role'] == 'system' %}
+        {{- messages[0]['content'] }}
+    {%- else %}
+        {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }}
+    {%- endif %}
+    {{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
+    {%- for tool in tools %}
+        {{- "\n" }}
+        {{- tool | tojson }}
+    {%- endfor %}
+    {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
+{%- else %}
+    {%- if messages[0]['role'] == 'system' %}
+        {{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }}
+    {%- else %}
+        {{- '<|im_start|>system\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\n' }}
+    {%- endif %}
+{%- endif %}
+{%- for message in messages %}
+    {%- if (message.role == "user") or (message.role == "system" and not loop.first) or (message.role == "assistant" and not message.tool_calls) %}
+        {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
+    {%- elif message.role == "assistant" %}
+        {{- '<|im_start|>' + message.role }}
+        {%- if message.content %}
+            {{- '\n' + message.content }}
+        {%- endif %}
+        {%- for tool_call in message.tool_calls %}
+            {%- if tool_call.function is defined %}
+                {%- set tool_call = tool_call.function %}
+            {%- endif %}
+            {{- '\n<tool_call>\n{"name": "' }}
+            {{- tool_call.name }}
+            {{- '", "arguments": ' }}
+            {{- tool_call.arguments | tojson }}
+            {{- '}\n</tool_call>' }}
+        {%- endfor %}
+        {{- '<|im_end|>\n' }}
+    {%- elif message.role == "tool" %}
+        {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != "tool") %}
+            {{- '<|im_start|>user' }}
+        {%- endif %}
+        {{- '\n<tool_response>\n' }}
+        {{- message.content }}
+        {{- '\n</tool_response>' }}
+        {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
+            {{- '<|im_end|>\n' }}
+        {%- endif %}
+    {%- endif %}
+{%- endfor %}
+{%- if add_generation_prompt %}
+    {{- '<|im_start|>assistant\n' }}
+{%- endif %}

merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "eos_token": {
+    "content": "<|im_end|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9c5ae00e602b8860cbd784ba82a8aa14e8feecec692e7076590d014d7b7fdafa
+size 11421896

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,208 @@

+{
+  "add_bos_token": false,
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "151643": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151644": {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151645": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151646": {
+      "content": "<|object_ref_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151647": {
+      "content": "<|object_ref_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151648": {
+      "content": "<|box_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151649": {
+      "content": "<|box_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151650": {
+      "content": "<|quad_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151651": {
+      "content": "<|quad_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151652": {
+      "content": "<|vision_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151653": {
+      "content": "<|vision_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151654": {
+      "content": "<|vision_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151655": {
+      "content": "<|image_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151656": {
+      "content": "<|video_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151657": {
+      "content": "<tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151658": {
+      "content": "</tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151659": {
+      "content": "<|fim_prefix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151660": {
+      "content": "<|fim_middle|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151661": {
+      "content": "<|fim_suffix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151662": {
+      "content": "<|fim_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151663": {
+      "content": "<|repo_name|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151664": {
+      "content": "<|file_sep|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    }
+  },
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "bos_token": null,
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "errors": "replace",
+  "extra_special_tokens": {},
+  "model_max_length": 131072,
+  "pad_token": "<|endoftext|>",
+  "padding_side": "right",
+  "split_special_tokens": false,
+  "tokenizer_class": "Qwen2Tokenizer",
+  "unk_token": null
+}

trainer_state.json ADDED Viewed

	@@ -0,0 +1,3694 @@

+{
+  "best_global_step": null,
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 1.5017840375586853,
+  "eval_steps": 50,
+  "global_step": 1000,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.003004694835680751,
+      "grad_norm": 3.345860242843628,
+      "learning_rate": 4.477611940298507e-06,
+      "loss": 3.0334,
+      "step": 2
+    },
+    {
+      "epoch": 0.006009389671361502,
+      "grad_norm": 3.1856071949005127,
+      "learning_rate": 1.343283582089552e-05,
+      "loss": 3.1443,
+      "step": 4
+    },
+    {
+      "epoch": 0.009014084507042254,
+      "grad_norm": 2.940918207168579,
+      "learning_rate": 2.2388059701492532e-05,
+      "loss": 2.9871,
+      "step": 6
+    },
+    {
+      "epoch": 0.012018779342723005,
+      "grad_norm": 1.7457096576690674,
+      "learning_rate": 3.134328358208955e-05,
+      "loss": 2.8271,
+      "step": 8
+    },
+    {
+      "epoch": 0.015023474178403756,
+      "grad_norm": 0.6361491680145264,
+      "learning_rate": 4.029850746268656e-05,
+      "loss": 2.6946,
+      "step": 10
+    },
+    {
+      "epoch": 0.018028169014084508,
+      "grad_norm": 0.7177234888076782,
+      "learning_rate": 4.925373134328358e-05,
+      "loss": 2.6783,
+      "step": 12
+    },
+    {
+      "epoch": 0.021032863849765257,
+      "grad_norm": 0.8950934410095215,
+      "learning_rate": 5.8208955223880594e-05,
+      "loss": 2.6021,
+      "step": 14
+    },
+    {
+      "epoch": 0.02403755868544601,
+      "grad_norm": 0.7039217352867126,
+      "learning_rate": 6.716417910447761e-05,
+      "loss": 2.5842,
+      "step": 16
+    },
+    {
+      "epoch": 0.027042253521126762,
+      "grad_norm": 0.36443546414375305,
+      "learning_rate": 7.611940298507463e-05,
+      "loss": 2.5101,
+      "step": 18
+    },
+    {
+      "epoch": 0.03004694835680751,
+      "grad_norm": 0.4032372832298279,
+      "learning_rate": 8.507462686567163e-05,
+      "loss": 2.4882,
+      "step": 20
+    },
+    {
+      "epoch": 0.03305164319248826,
+      "grad_norm": 0.44657716155052185,
+      "learning_rate": 9.402985074626866e-05,
+      "loss": 2.5475,
+      "step": 22
+    },
+    {
+      "epoch": 0.036056338028169016,
+      "grad_norm": 0.37746167182922363,
+      "learning_rate": 0.00010298507462686566,
+      "loss": 2.5153,
+      "step": 24
+    },
+    {
+      "epoch": 0.039061032863849765,
+      "grad_norm": 0.3511013388633728,
+      "learning_rate": 0.00011194029850746269,
+      "loss": 2.4901,
+      "step": 26
+    },
+    {
+      "epoch": 0.042065727699530514,
+      "grad_norm": 0.3548150062561035,
+      "learning_rate": 0.00012089552238805969,
+      "loss": 2.4899,
+      "step": 28
+    },
+    {
+      "epoch": 0.04507042253521127,
+      "grad_norm": 0.34047168493270874,
+      "learning_rate": 0.0001298507462686567,
+      "loss": 2.5165,
+      "step": 30
+    },
+    {
+      "epoch": 0.04807511737089202,
+      "grad_norm": 0.328000009059906,
+      "learning_rate": 0.00013880597014925372,
+      "loss": 2.4753,
+      "step": 32
+    },
+    {
+      "epoch": 0.05107981220657277,
+      "grad_norm": 0.3316989243030548,
+      "learning_rate": 0.00014776119402985072,
+      "loss": 2.4151,
+      "step": 34
+    },
+    {
+      "epoch": 0.054084507042253524,
+      "grad_norm": 0.32479584217071533,
+      "learning_rate": 0.00015671641791044772,
+      "loss": 2.3567,
+      "step": 36
+    },
+    {
+      "epoch": 0.05708920187793427,
+      "grad_norm": 0.32302042841911316,
+      "learning_rate": 0.00016567164179104478,
+      "loss": 2.5137,
+      "step": 38
+    },
+    {
+      "epoch": 0.06009389671361502,
+      "grad_norm": 0.3057011365890503,
+      "learning_rate": 0.00017462686567164178,
+      "loss": 2.4078,
+      "step": 40
+    },
+    {
+      "epoch": 0.06309859154929577,
+      "grad_norm": 0.30654028058052063,
+      "learning_rate": 0.00018358208955223879,
+      "loss": 2.3863,
+      "step": 42
+    },
+    {
+      "epoch": 0.06610328638497652,
+      "grad_norm": 0.29615163803100586,
+      "learning_rate": 0.0001925373134328358,
+      "loss": 2.4222,
+      "step": 44
+    },
+    {
+      "epoch": 0.06910798122065728,
+      "grad_norm": 0.2895660102367401,
+      "learning_rate": 0.00020149253731343284,
+      "loss": 2.4977,
+      "step": 46
+    },
+    {
+      "epoch": 0.07211267605633803,
+      "grad_norm": 0.28147512674331665,
+      "learning_rate": 0.00021044776119402985,
+      "loss": 2.4414,
+      "step": 48
+    },
+    {
+      "epoch": 0.07511737089201878,
+      "grad_norm": 0.2934994399547577,
+      "learning_rate": 0.00021940298507462685,
+      "loss": 2.4121,
+      "step": 50
+    },
+    {
+      "epoch": 0.07511737089201878,
+      "eval_loss": 2.314633369445801,
+      "eval_runtime": 2.2345,
+      "eval_samples_per_second": 15.216,
+      "eval_steps_per_second": 1.343,
+      "step": 50
+    },
+    {
+      "epoch": 0.07812206572769953,
+      "grad_norm": 0.2928488552570343,
+      "learning_rate": 0.00022835820895522385,
+      "loss": 2.317,
+      "step": 52
+    },
+    {
+      "epoch": 0.08112676056338028,
+      "grad_norm": 0.3315216302871704,
+      "learning_rate": 0.00023731343283582085,
+      "loss": 2.2459,
+      "step": 54
+    },
+    {
+      "epoch": 0.08413145539906103,
+      "grad_norm": 0.3117794990539551,
+      "learning_rate": 0.0002462686567164179,
+      "loss": 2.4086,
+      "step": 56
+    },
+    {
+      "epoch": 0.08713615023474178,
+      "grad_norm": 0.31090959906578064,
+      "learning_rate": 0.0002552238805970149,
+      "loss": 2.3513,
+      "step": 58
+    },
+    {
+      "epoch": 0.09014084507042254,
+      "grad_norm": 0.3119109869003296,
+      "learning_rate": 0.0002641791044776119,
+      "loss": 2.3925,
+      "step": 60
+    },
+    {
+      "epoch": 0.09314553990610329,
+      "grad_norm": 0.312026709318161,
+      "learning_rate": 0.0002731343283582089,
+      "loss": 2.4096,
+      "step": 62
+    },
+    {
+      "epoch": 0.09615023474178404,
+      "grad_norm": 0.32059046626091003,
+      "learning_rate": 0.00028208955223880597,
+      "loss": 2.308,
+      "step": 64
+    },
+    {
+      "epoch": 0.09915492957746479,
+      "grad_norm": 0.31604453921318054,
+      "learning_rate": 0.00029104477611940297,
+      "loss": 2.3848,
+      "step": 66
+    },
+    {
+      "epoch": 0.10215962441314554,
+      "grad_norm": 0.35594430565834045,
+      "learning_rate": 0.0003,
+      "loss": 2.3447,
+      "step": 68
+    },
+    {
+      "epoch": 0.10516431924882629,
+      "grad_norm": 0.35272663831710815,
+      "learning_rate": 0.0002999981497131758,
+      "loss": 2.3022,
+      "step": 70
+    },
+    {
+      "epoch": 0.10816901408450705,
+      "grad_norm": 0.36890238523483276,
+      "learning_rate": 0.000299992598898351,
+      "loss": 2.3401,
+      "step": 72
+    },
+    {
+      "epoch": 0.1111737089201878,
+      "grad_norm": 0.3586670160293579,
+      "learning_rate": 0.0002999833476924667,
+      "loss": 2.3703,
+      "step": 74
+    },
+    {
+      "epoch": 0.11417840375586855,
+      "grad_norm": 0.32580092549324036,
+      "learning_rate": 0.0002999703963237548,
+      "loss": 2.3269,
+      "step": 76
+    },
+    {
+      "epoch": 0.1171830985915493,
+      "grad_norm": 0.3425409197807312,
+      "learning_rate": 0.0002999537451117319,
+      "loss": 2.3419,
+      "step": 78
+    },
+    {
+      "epoch": 0.12018779342723004,
+      "grad_norm": 0.35141387581825256,
+      "learning_rate": 0.00029993339446719155,
+      "loss": 2.2929,
+      "step": 80
+    },
+    {
+      "epoch": 0.1231924882629108,
+      "grad_norm": 0.34454619884490967,
+      "learning_rate": 0.0002999093448921942,
+      "loss": 2.2712,
+      "step": 82
+    },
+    {
+      "epoch": 0.12619718309859154,
+      "grad_norm": 0.3879588842391968,
+      "learning_rate": 0.00029988159698005463,
+      "loss": 2.3465,
+      "step": 84
+    },
+    {
+      "epoch": 0.1292018779342723,
+      "grad_norm": 0.40479347109794617,
+      "learning_rate": 0.0002998501514153275,
+      "loss": 2.3002,
+      "step": 86
+    },
+    {
+      "epoch": 0.13220657276995304,
+      "grad_norm": 0.35989129543304443,
+      "learning_rate": 0.00029981500897379023,
+      "loss": 2.2845,
+      "step": 88
+    },
+    {
+      "epoch": 0.1352112676056338,
+      "grad_norm": 0.3578004240989685,
+      "learning_rate": 0.00029977617052242417,
+      "loss": 2.2632,
+      "step": 90
+    },
+    {
+      "epoch": 0.13821596244131457,
+      "grad_norm": 0.3477707505226135,
+      "learning_rate": 0.000299733637019393,
+      "loss": 2.2441,
+      "step": 92
+    },
+    {
+      "epoch": 0.14122065727699532,
+      "grad_norm": 0.36094820499420166,
+      "learning_rate": 0.00029968740951401914,
+      "loss": 2.294,
+      "step": 94
+    },
+    {
+      "epoch": 0.14422535211267606,
+      "grad_norm": 0.3719354271888733,
+      "learning_rate": 0.0002996374891467578,
+      "loss": 2.2513,
+      "step": 96
+    },
+    {
+      "epoch": 0.1472300469483568,
+      "grad_norm": 0.39865416288375854,
+      "learning_rate": 0.000299583877149169,
+      "loss": 2.294,
+      "step": 98
+    },
+    {
+      "epoch": 0.15023474178403756,
+      "grad_norm": 0.34469255805015564,
+      "learning_rate": 0.000299526574843887,
+      "loss": 2.2629,
+      "step": 100
+    },
+    {
+      "epoch": 0.15023474178403756,
+      "eval_loss": 2.2252323627471924,
+      "eval_runtime": 2.2926,
+      "eval_samples_per_second": 14.83,
+      "eval_steps_per_second": 1.309,
+      "step": 100
+    },
+    {
+      "epoch": 0.1532394366197183,
+      "grad_norm": 0.34142881631851196,
+      "learning_rate": 0.0002994655836445878,
+      "loss": 2.2671,
+      "step": 102
+    },
+    {
+      "epoch": 0.15624413145539906,
+      "grad_norm": 0.3718002736568451,
+      "learning_rate": 0.00029940090505595424,
+      "loss": 2.2759,
+      "step": 104
+    },
+    {
+      "epoch": 0.1592488262910798,
+      "grad_norm": 0.35945722460746765,
+      "learning_rate": 0.00029933254067363886,
+      "loss": 2.2937,
+      "step": 106
+    },
+    {
+      "epoch": 0.16225352112676056,
+      "grad_norm": 0.34711506962776184,
+      "learning_rate": 0.0002992604921842246,
+      "loss": 2.32,
+      "step": 108
+    },
+    {
+      "epoch": 0.1652582159624413,
+      "grad_norm": 0.35870376229286194,
+      "learning_rate": 0.000299184761365183,
+      "loss": 2.2718,
+      "step": 110
+    },
+    {
+      "epoch": 0.16826291079812206,
+      "grad_norm": 0.3389629125595093,
+      "learning_rate": 0.0002991053500848305,
+      "loss": 2.3221,
+      "step": 112
+    },
+    {
+      "epoch": 0.1712676056338028,
+      "grad_norm": 0.3516780734062195,
+      "learning_rate": 0.00029902226030228247,
+      "loss": 2.3205,
+      "step": 114
+    },
+    {
+      "epoch": 0.17427230046948355,
+      "grad_norm": 0.3356691300868988,
+      "learning_rate": 0.0002989354940674046,
+      "loss": 2.1508,
+      "step": 116
+    },
+    {
+      "epoch": 0.17727699530516433,
+      "grad_norm": 0.34136658906936646,
+      "learning_rate": 0.00029884505352076264,
+      "loss": 2.2405,
+      "step": 118
+    },
+    {
+      "epoch": 0.18028169014084508,
+      "grad_norm": 0.40186941623687744,
+      "learning_rate": 0.00029875094089356903,
+      "loss": 2.2773,
+      "step": 120
+    },
+    {
+      "epoch": 0.18328638497652583,
+      "grad_norm": 0.3942573666572571,
+      "learning_rate": 0.00029865315850762864,
+      "loss": 2.1983,
+      "step": 122
+    },
+    {
+      "epoch": 0.18629107981220658,
+      "grad_norm": 0.3947548568248749,
+      "learning_rate": 0.00029855170877528096,
+      "loss": 2.2913,
+      "step": 124
+    },
+    {
+      "epoch": 0.18929577464788733,
+      "grad_norm": 0.363520085811615,
+      "learning_rate": 0.00029844659419934056,
+      "loss": 2.2741,
+      "step": 126
+    },
+    {
+      "epoch": 0.19230046948356808,
+      "grad_norm": 0.36190399527549744,
+      "learning_rate": 0.0002983378173730359,
+      "loss": 2.3704,
+      "step": 128
+    },
+    {
+      "epoch": 0.19530516431924883,
+      "grad_norm": 0.416559636592865,
+      "learning_rate": 0.0002982253809799444,
+      "loss": 2.2279,
+      "step": 130
+    },
+    {
+      "epoch": 0.19830985915492957,
+      "grad_norm": 0.402997761964798,
+      "learning_rate": 0.0002981092877939272,
+      "loss": 2.2443,
+      "step": 132
+    },
+    {
+      "epoch": 0.20131455399061032,
+      "grad_norm": 0.3548254668712616,
+      "learning_rate": 0.0002979895406790603,
+      "loss": 2.221,
+      "step": 134
+    },
+    {
+      "epoch": 0.20431924882629107,
+      "grad_norm": 0.3473295569419861,
+      "learning_rate": 0.0002978661425895637,
+      "loss": 2.1603,
+      "step": 136
+    },
+    {
+      "epoch": 0.20732394366197182,
+      "grad_norm": 0.39127662777900696,
+      "learning_rate": 0.0002977390965697288,
+      "loss": 2.2464,
+      "step": 138
+    },
+    {
+      "epoch": 0.21032863849765257,
+      "grad_norm": 0.39523211121559143,
+      "learning_rate": 0.0002976084057538435,
+      "loss": 2.1841,
+      "step": 140
+    },
+    {
+      "epoch": 0.21333333333333335,
+      "grad_norm": 0.34574830532073975,
+      "learning_rate": 0.0002974740733661142,
+      "loss": 2.2044,
+      "step": 142
+    },
+    {
+      "epoch": 0.2163380281690141,
+      "grad_norm": 0.37754490971565247,
+      "learning_rate": 0.00029733610272058685,
+      "loss": 2.2254,
+      "step": 144
+    },
+    {
+      "epoch": 0.21934272300469485,
+      "grad_norm": 0.3925940990447998,
+      "learning_rate": 0.00029719449722106524,
+      "loss": 2.1979,
+      "step": 146
+    },
+    {
+      "epoch": 0.2223474178403756,
+      "grad_norm": 0.39449626207351685,
+      "learning_rate": 0.0002970492603610264,
+      "loss": 2.2307,
+      "step": 148
+    },
+    {
+      "epoch": 0.22535211267605634,
+      "grad_norm": 0.37469711899757385,
+      "learning_rate": 0.00029690039572353495,
+      "loss": 2.1796,
+      "step": 150
+    },
+    {
+      "epoch": 0.22535211267605634,
+      "eval_loss": 2.181159019470215,
+      "eval_runtime": 2.2259,
+      "eval_samples_per_second": 15.275,
+      "eval_steps_per_second": 1.348,
+      "step": 150
+    },
+    {
+      "epoch": 0.2283568075117371,
+      "grad_norm": 0.3608339726924896,
+      "learning_rate": 0.0002967479069811546,
+      "loss": 2.2276,
+      "step": 152
+    },
+    {
+      "epoch": 0.23136150234741784,
+      "grad_norm": 0.3705087900161743,
+      "learning_rate": 0.0002965917978958576,
+      "loss": 2.2385,
+      "step": 154
+    },
+    {
+      "epoch": 0.2343661971830986,
+      "grad_norm": 0.3505542576313019,
+      "learning_rate": 0.00029643207231893167,
+      "loss": 2.2167,
+      "step": 156
+    },
+    {
+      "epoch": 0.23737089201877934,
+      "grad_norm": 0.3650299906730652,
+      "learning_rate": 0.0002962687341908852,
+      "loss": 2.2137,
+      "step": 158
+    },
+    {
+      "epoch": 0.2403755868544601,
+      "grad_norm": 0.3850683569908142,
+      "learning_rate": 0.00029610178754135,
+      "loss": 2.2037,
+      "step": 160
+    },
+    {
+      "epoch": 0.24338028169014084,
+      "grad_norm": 0.3920019865036011,
+      "learning_rate": 0.0002959312364889819,
+      "loss": 2.2262,
+      "step": 162
+    },
+    {
+      "epoch": 0.2463849765258216,
+      "grad_norm": 0.35709312558174133,
+      "learning_rate": 0.0002957570852413591,
+      "loss": 2.1964,
+      "step": 164
+    },
+    {
+      "epoch": 0.24938967136150234,
+      "grad_norm": 0.3626340925693512,
+      "learning_rate": 0.0002955793380948784,
+      "loss": 2.1835,
+      "step": 166
+    },
+    {
+      "epoch": 0.2523943661971831,
+      "grad_norm": 0.3944832682609558,
+      "learning_rate": 0.0002953979994346492,
+      "loss": 2.2062,
+      "step": 168
+    },
+    {
+      "epoch": 0.25539906103286386,
+      "grad_norm": 0.3656717538833618,
+      "learning_rate": 0.0002952130737343852,
+      "loss": 2.1978,
+      "step": 170
+    },
+    {
+      "epoch": 0.2584037558685446,
+      "grad_norm": 0.37548840045928955,
+      "learning_rate": 0.0002950245655562943,
+      "loss": 2.2321,
+      "step": 172
+    },
+    {
+      "epoch": 0.26140845070422536,
+      "grad_norm": 0.3501577079296112,
+      "learning_rate": 0.00029483247955096575,
+      "loss": 2.2395,
+      "step": 174
+    },
+    {
+      "epoch": 0.2644131455399061,
+      "grad_norm": 0.3411433696746826,
+      "learning_rate": 0.0002946368204572556,
+      "loss": 2.1102,
+      "step": 176
+    },
+    {
+      "epoch": 0.26741784037558686,
+      "grad_norm": 0.3664512634277344,
+      "learning_rate": 0.0002944375931021699,
+      "loss": 2.194,
+      "step": 178
+    },
+    {
+      "epoch": 0.2704225352112676,
+      "grad_norm": 0.3627581298351288,
+      "learning_rate": 0.0002942348024007451,
+      "loss": 2.2813,
+      "step": 180
+    },
+    {
+      "epoch": 0.27342723004694836,
+      "grad_norm": 0.37071582674980164,
+      "learning_rate": 0.00029402845335592756,
+      "loss": 2.2711,
+      "step": 182
+    },
+    {
+      "epoch": 0.27643192488262913,
+      "grad_norm": 0.381433367729187,
+      "learning_rate": 0.00029381855105844947,
+      "loss": 2.2212,
+      "step": 184
+    },
+    {
+      "epoch": 0.27943661971830985,
+      "grad_norm": 0.3830112814903259,
+      "learning_rate": 0.0002936051006867035,
+      "loss": 2.2213,
+      "step": 186
+    },
+    {
+      "epoch": 0.28244131455399063,
+      "grad_norm": 0.39527764916419983,
+      "learning_rate": 0.0002933881075066152,
+      "loss": 2.2368,
+      "step": 188
+    },
+    {
+      "epoch": 0.28544600938967135,
+      "grad_norm": 0.4096282422542572,
+      "learning_rate": 0.00029316757687151285,
+      "loss": 2.2056,
+      "step": 190
+    },
+    {
+      "epoch": 0.28845070422535213,
+      "grad_norm": 0.3712705373764038,
+      "learning_rate": 0.0002929435142219955,
+      "loss": 2.198,
+      "step": 192
+    },
+    {
+      "epoch": 0.29145539906103285,
+      "grad_norm": 0.3953078091144562,
+      "learning_rate": 0.0002927159250857987,
+      "loss": 2.3001,
+      "step": 194
+    },
+    {
+      "epoch": 0.2944600938967136,
+      "grad_norm": 0.37445083260536194,
+      "learning_rate": 0.00029248481507765817,
+      "loss": 2.1669,
+      "step": 196
+    },
+    {
+      "epoch": 0.29746478873239435,
+      "grad_norm": 0.3499473035335541,
+      "learning_rate": 0.00029225018989917134,
+      "loss": 2.2206,
+      "step": 198
+    },
+    {
+      "epoch": 0.3004694835680751,
+      "grad_norm": 0.39210423827171326,
+      "learning_rate": 0.00029201205533865653,
+      "loss": 2.2343,
+      "step": 200
+    },
+    {
+      "epoch": 0.3004694835680751,
+      "eval_loss": 2.141900062561035,
+      "eval_runtime": 2.2356,
+      "eval_samples_per_second": 15.208,
+      "eval_steps_per_second": 1.342,
+      "step": 200
+    },
+    {
+      "epoch": 0.30347417840375585,
+      "grad_norm": 0.39515823125839233,
+      "learning_rate": 0.0002917704172710103,
+      "loss": 2.2075,
+      "step": 202
+    },
+    {
+      "epoch": 0.3064788732394366,
+      "grad_norm": 0.3761007785797119,
+      "learning_rate": 0.00029152528165756234,
+      "loss": 2.1649,
+      "step": 204
+    },
+    {
+      "epoch": 0.30948356807511734,
+      "grad_norm": 0.36122599244117737,
+      "learning_rate": 0.0002912766545459287,
+      "loss": 2.1931,
+      "step": 206
+    },
+    {
+      "epoch": 0.3124882629107981,
+      "grad_norm": 0.35382214188575745,
+      "learning_rate": 0.00029102454206986217,
+      "loss": 2.2054,
+      "step": 208
+    },
+    {
+      "epoch": 0.3154929577464789,
+      "grad_norm": 0.39906132221221924,
+      "learning_rate": 0.0002907689504491015,
+      "loss": 2.0899,
+      "step": 210
+    },
+    {
+      "epoch": 0.3184976525821596,
+      "grad_norm": 0.38114792108535767,
+      "learning_rate": 0.00029050988598921726,
+      "loss": 2.219,
+      "step": 212
+    },
+    {
+      "epoch": 0.3215023474178404,
+      "grad_norm": 0.3845846951007843,
+      "learning_rate": 0.00029024735508145696,
+      "loss": 2.2918,
+      "step": 214
+    },
+    {
+      "epoch": 0.3245070422535211,
+      "grad_norm": 0.3749115765094757,
+      "learning_rate": 0.00028998136420258706,
+      "loss": 2.1253,
+      "step": 216
+    },
+    {
+      "epoch": 0.3275117370892019,
+      "grad_norm": 0.3539542853832245,
+      "learning_rate": 0.00028971191991473304,
+      "loss": 2.2459,
+      "step": 218
+    },
+    {
+      "epoch": 0.3305164319248826,
+      "grad_norm": 0.3546983599662781,
+      "learning_rate": 0.0002894390288652179,
+      "loss": 2.0911,
+      "step": 220
+    },
+    {
+      "epoch": 0.3335211267605634,
+      "grad_norm": 0.40299192070961,
+      "learning_rate": 0.00028916269778639765,
+      "loss": 2.1947,
+      "step": 222
+    },
+    {
+      "epoch": 0.3365258215962441,
+      "grad_norm": 0.3689326345920563,
+      "learning_rate": 0.00028888293349549576,
+      "loss": 2.1966,
+      "step": 224
+    },
+    {
+      "epoch": 0.3395305164319249,
+      "grad_norm": 0.3649276793003082,
+      "learning_rate": 0.0002885997428944347,
+      "loss": 2.2392,
+      "step": 226
+    },
+    {
+      "epoch": 0.3425352112676056,
+      "grad_norm": 0.38962477445602417,
+      "learning_rate": 0.0002883131329696656,
+      "loss": 2.1984,
+      "step": 228
+    },
+    {
+      "epoch": 0.3455399061032864,
+      "grad_norm": 0.37435683608055115,
+      "learning_rate": 0.000288023110791996,
+      "loss": 2.1761,
+      "step": 230
+    },
+    {
+      "epoch": 0.3485446009389671,
+      "grad_norm": 0.3510904610157013,
+      "learning_rate": 0.0002877296835164155,
+      "loss": 2.1674,
+      "step": 232
+    },
+    {
+      "epoch": 0.3515492957746479,
+      "grad_norm": 0.36861664056777954,
+      "learning_rate": 0.00028743285838191894,
+      "loss": 2.2275,
+      "step": 234
+    },
+    {
+      "epoch": 0.35455399061032866,
+      "grad_norm": 0.3731881380081177,
+      "learning_rate": 0.00028713264271132817,
+      "loss": 2.1107,
+      "step": 236
+    },
+    {
+      "epoch": 0.3575586854460094,
+      "grad_norm": 0.3827975392341614,
+      "learning_rate": 0.00028682904391111124,
+      "loss": 2.1624,
+      "step": 238
+    },
+    {
+      "epoch": 0.36056338028169016,
+      "grad_norm": 0.35969778895378113,
+      "learning_rate": 0.0002865220694711996,
+      "loss": 2.1862,
+      "step": 240
+    },
+    {
+      "epoch": 0.3635680751173709,
+      "grad_norm": 0.3840770721435547,
+      "learning_rate": 0.0002862117269648033,
+      "loss": 2.1971,
+      "step": 242
+    },
+    {
+      "epoch": 0.36657276995305166,
+      "grad_norm": 0.3853059709072113,
+      "learning_rate": 0.00028589802404822455,
+      "loss": 2.1691,
+      "step": 244
+    },
+    {
+      "epoch": 0.3695774647887324,
+      "grad_norm": 0.3811451196670532,
+      "learning_rate": 0.00028558096846066807,
+      "loss": 2.1574,
+      "step": 246
+    },
+    {
+      "epoch": 0.37258215962441316,
+      "grad_norm": 0.3676210939884186,
+      "learning_rate": 0.00028526056802405104,
+      "loss": 2.2257,
+      "step": 248
+    },
+    {
+      "epoch": 0.3755868544600939,
+      "grad_norm": 0.3635883033275604,
+      "learning_rate": 0.0002849368306428096,
+      "loss": 2.2546,
+      "step": 250
+    },
+    {
+      "epoch": 0.3755868544600939,
+      "eval_loss": 2.0982940196990967,
+      "eval_runtime": 2.2318,
+      "eval_samples_per_second": 15.234,
+      "eval_steps_per_second": 1.344,
+      "step": 250
+    },
+    {
+      "epoch": 0.37859154929577465,
+      "grad_norm": 0.36666548252105713,
+      "learning_rate": 0.0002846097643037037,
+      "loss": 2.2235,
+      "step": 252
+    },
+    {
+      "epoch": 0.3815962441314554,
+      "grad_norm": 0.37013521790504456,
+      "learning_rate": 0.0002842793770756207,
+      "loss": 2.2266,
+      "step": 254
+    },
+    {
+      "epoch": 0.38460093896713615,
+      "grad_norm": 0.3608769178390503,
+      "learning_rate": 0.00028394567710937564,
+      "loss": 2.162,
+      "step": 256
+    },
+    {
+      "epoch": 0.38760563380281693,
+      "grad_norm": 0.37627193331718445,
+      "learning_rate": 0.00028360867263751055,
+      "loss": 2.1686,
+      "step": 258
+    },
+    {
+      "epoch": 0.39061032863849765,
+      "grad_norm": 0.38580891489982605,
+      "learning_rate": 0.00028326837197409116,
+      "loss": 2.1346,
+      "step": 260
+    },
+    {
+      "epoch": 0.3936150234741784,
+      "grad_norm": 0.3952076733112335,
+      "learning_rate": 0.000282924783514502,
+      "loss": 2.1393,
+      "step": 262
+    },
+    {
+      "epoch": 0.39661971830985915,
+      "grad_norm": 0.3873717188835144,
+      "learning_rate": 0.00028257791573523905,
+      "loss": 2.1883,
+      "step": 264
+    },
+    {
+      "epoch": 0.3996244131455399,
+      "grad_norm": 0.390897661447525,
+      "learning_rate": 0.0002822277771937007,
+      "loss": 2.1987,
+      "step": 266
+    },
+    {
+      "epoch": 0.40262910798122065,
+      "grad_norm": 0.4009636342525482,
+      "learning_rate": 0.0002818743765279767,
+      "loss": 2.1993,
+      "step": 268
+    },
+    {
+      "epoch": 0.4056338028169014,
+      "grad_norm": 0.3799436092376709,
+      "learning_rate": 0.00028151772245663505,
+      "loss": 2.2225,
+      "step": 270
+    },
+    {
+      "epoch": 0.40863849765258214,
+      "grad_norm": 0.3564237058162689,
+      "learning_rate": 0.0002811578237785067,
+      "loss": 2.1673,
+      "step": 272
+    },
+    {
+      "epoch": 0.4116431924882629,
+      "grad_norm": 0.37377825379371643,
+      "learning_rate": 0.0002807946893724688,
+      "loss": 2.1529,
+      "step": 274
+    },
+    {
+      "epoch": 0.41464788732394364,
+      "grad_norm": 0.3791629374027252,
+      "learning_rate": 0.00028042832819722536,
+      "loss": 2.1718,
+      "step": 276
+    },
+    {
+      "epoch": 0.4176525821596244,
+      "grad_norm": 0.35380980372428894,
+      "learning_rate": 0.0002800587492910866,
+      "loss": 2.142,
+      "step": 278
+    },
+    {
+      "epoch": 0.42065727699530514,
+      "grad_norm": 0.3733883202075958,
+      "learning_rate": 0.0002796859617717455,
+      "loss": 2.1176,
+      "step": 280
+    },
+    {
+      "epoch": 0.4236619718309859,
+      "grad_norm": 0.37677061557769775,
+      "learning_rate": 0.0002793099748360533,
+      "loss": 2.2441,
+      "step": 282
+    },
+    {
+      "epoch": 0.4266666666666667,
+      "grad_norm": 0.37634608149528503,
+      "learning_rate": 0.0002789307977597921,
+      "loss": 2.2257,
+      "step": 284
+    },
+    {
+      "epoch": 0.4296713615023474,
+      "grad_norm": 0.3666776120662689,
+      "learning_rate": 0.0002785484398974467,
+      "loss": 2.1931,
+      "step": 286
+    },
+    {
+      "epoch": 0.4326760563380282,
+      "grad_norm": 0.3721160292625427,
+      "learning_rate": 0.0002781629106819733,
+      "loss": 2.1501,
+      "step": 288
+    },
+    {
+      "epoch": 0.4356807511737089,
+      "grad_norm": 0.3687131702899933,
+      "learning_rate": 0.00027777421962456685,
+      "loss": 2.2567,
+      "step": 290
+    },
+    {
+      "epoch": 0.4386854460093897,
+      "grad_norm": 0.39323750138282776,
+      "learning_rate": 0.0002773823763144266,
+      "loss": 2.1721,
+      "step": 292
+    },
+    {
+      "epoch": 0.4416901408450704,
+      "grad_norm": 0.380220502614975,
+      "learning_rate": 0.0002769873904185195,
+      "loss": 2.1194,
+      "step": 294
+    },
+    {
+      "epoch": 0.4446948356807512,
+      "grad_norm": 0.3765008747577667,
+      "learning_rate": 0.0002765892716813414,
+      "loss": 2.1753,
+      "step": 296
+    },
+    {
+      "epoch": 0.4476995305164319,
+      "grad_norm": 0.3636697828769684,
+      "learning_rate": 0.0002761880299246772,
+      "loss": 2.115,
+      "step": 298
+    },
+    {
+      "epoch": 0.4507042253521127,
+      "grad_norm": 0.37536996603012085,
+      "learning_rate": 0.0002757836750473578,
+      "loss": 2.1901,
+      "step": 300
+    },
+    {
+      "epoch": 0.4507042253521127,
+      "eval_loss": 2.075345754623413,
+      "eval_runtime": 2.2318,
+      "eval_samples_per_second": 15.234,
+      "eval_steps_per_second": 1.344,
+      "step": 300
+    },
+    {
+      "epoch": 0.4537089201877934,
+      "grad_norm": 0.38492923974990845,
+      "learning_rate": 0.00027537621702501675,
+      "loss": 2.0889,
+      "step": 302
+    },
+    {
+      "epoch": 0.4567136150234742,
+      "grad_norm": 0.3728578984737396,
+      "learning_rate": 0.0002749656659098434,
+      "loss": 2.1575,
+      "step": 304
+    },
+    {
+      "epoch": 0.4597183098591549,
+      "grad_norm": 0.37799346446990967,
+      "learning_rate": 0.0002745520318303354,
+      "loss": 2.1455,
+      "step": 306
+    },
+    {
+      "epoch": 0.4627230046948357,
+      "grad_norm": 0.36139002442359924,
+      "learning_rate": 0.0002741353249910486,
+      "loss": 2.2059,
+      "step": 308
+    },
+    {
+      "epoch": 0.46572769953051646,
+      "grad_norm": 0.35566097497940063,
+      "learning_rate": 0.0002737155556723452,
+      "loss": 2.1419,
+      "step": 310
+    },
+    {
+      "epoch": 0.4687323943661972,
+      "grad_norm": 0.37092921137809753,
+      "learning_rate": 0.0002732927342301406,
+      "loss": 2.1868,
+      "step": 312
+    },
+    {
+      "epoch": 0.47173708920187796,
+      "grad_norm": 0.3878864347934723,
+      "learning_rate": 0.00027286687109564725,
+      "loss": 2.1406,
+      "step": 314
+    },
+    {
+      "epoch": 0.4747417840375587,
+      "grad_norm": 0.38121914863586426,
+      "learning_rate": 0.0002724379767751177,
+      "loss": 2.1543,
+      "step": 316
+    },
+    {
+      "epoch": 0.47774647887323946,
+      "grad_norm": 0.3697877824306488,
+      "learning_rate": 0.0002720060618495856,
+      "loss": 2.1297,
+      "step": 318
+    },
+    {
+      "epoch": 0.4807511737089202,
+      "grad_norm": 0.3632924556732178,
+      "learning_rate": 0.0002715711369746041,
+      "loss": 2.186,
+      "step": 320
+    },
+    {
+      "epoch": 0.48375586854460095,
+      "grad_norm": 0.37912604212760925,
+      "learning_rate": 0.0002711332128799834,
+      "loss": 2.1316,
+      "step": 322
+    },
+    {
+      "epoch": 0.4867605633802817,
+      "grad_norm": 0.3651125729084015,
+      "learning_rate": 0.0002706923003695261,
+      "loss": 2.1664,
+      "step": 324
+    },
+    {
+      "epoch": 0.48976525821596245,
+      "grad_norm": 0.37822335958480835,
+      "learning_rate": 0.00027024841032076007,
+      "loss": 2.1566,
+      "step": 326
+    },
+    {
+      "epoch": 0.4927699530516432,
+      "grad_norm": 0.3776490390300751,
+      "learning_rate": 0.0002698015536846709,
+      "loss": 2.1953,
+      "step": 328
+    },
+    {
+      "epoch": 0.49577464788732395,
+      "grad_norm": 0.3854661285877228,
+      "learning_rate": 0.0002693517414854312,
+      "loss": 2.2063,
+      "step": 330
+    },
+    {
+      "epoch": 0.49877934272300467,
+      "grad_norm": 0.368531197309494,
+      "learning_rate": 0.0002688989848201287,
+      "loss": 2.0906,
+      "step": 332
+    },
+    {
+      "epoch": 0.5017840375586854,
+      "grad_norm": 0.38510599732398987,
+      "learning_rate": 0.00026844329485849276,
+      "loss": 2.1932,
+      "step": 334
+    },
+    {
+      "epoch": 0.5047887323943662,
+      "grad_norm": 0.3947513997554779,
+      "learning_rate": 0.00026798468284261836,
+      "loss": 2.1378,
+      "step": 336
+    },
+    {
+      "epoch": 0.507793427230047,
+      "grad_norm": 0.3518299162387848,
+      "learning_rate": 0.00026752316008668916,
+      "loss": 2.1842,
+      "step": 338
+    },
+    {
+      "epoch": 0.5107981220657277,
+      "grad_norm": 0.3755107820034027,
+      "learning_rate": 0.0002670587379766981,
+      "loss": 2.2129,
+      "step": 340
+    },
+    {
+      "epoch": 0.5138028169014085,
+      "grad_norm": 0.3730200529098511,
+      "learning_rate": 0.0002665914279701668,
+      "loss": 2.2018,
+      "step": 342
+    },
+    {
+      "epoch": 0.5168075117370892,
+      "grad_norm": 0.380302757024765,
+      "learning_rate": 0.00026612124159586237,
+      "loss": 2.1619,
+      "step": 344
+    },
+    {
+      "epoch": 0.5198122065727699,
+      "grad_norm": 0.38186338543891907,
+      "learning_rate": 0.0002656481904535136,
+      "loss": 2.1248,
+      "step": 346
+    },
+    {
+      "epoch": 0.5228169014084507,
+      "grad_norm": 0.39248546957969666,
+      "learning_rate": 0.0002651722862135245,
+      "loss": 2.1109,
+      "step": 348
+    },
+    {
+      "epoch": 0.5258215962441315,
+      "grad_norm": 0.37442460656166077,
+      "learning_rate": 0.0002646935406166862,
+      "loss": 2.1597,
+      "step": 350
+    },
+    {
+      "epoch": 0.5258215962441315,
+      "eval_loss": 2.0383801460266113,
+      "eval_runtime": 2.239,
+      "eval_samples_per_second": 15.186,
+      "eval_steps_per_second": 1.34,
+      "step": 350
+    },
+    {
+      "epoch": 0.5288262910798122,
+      "grad_norm": 0.3747282922267914,
+      "learning_rate": 0.0002642119654738878,
+      "loss": 2.1359,
+      "step": 352
+    },
+    {
+      "epoch": 0.5318309859154929,
+      "grad_norm": 0.3742654025554657,
+      "learning_rate": 0.0002637275726658244,
+      "loss": 2.1655,
+      "step": 354
+    },
+    {
+      "epoch": 0.5348356807511737,
+      "grad_norm": 0.3777499198913574,
+      "learning_rate": 0.00026324037414270443,
+      "loss": 2.2144,
+      "step": 356
+    },
+    {
+      "epoch": 0.5378403755868545,
+      "grad_norm": 0.3784273862838745,
+      "learning_rate": 0.00026275038192395466,
+      "loss": 2.1617,
+      "step": 358
+    },
+    {
+      "epoch": 0.5408450704225352,
+      "grad_norm": 0.3833068311214447,
+      "learning_rate": 0.00026225760809792375,
+      "loss": 2.19,
+      "step": 360
+    },
+    {
+      "epoch": 0.5438497652582159,
+      "grad_norm": 0.3657234013080597,
+      "learning_rate": 0.0002617620648215839,
+      "loss": 2.1348,
+      "step": 362
+    },
+    {
+      "epoch": 0.5468544600938967,
+      "grad_norm": 0.36410313844680786,
+      "learning_rate": 0.00026126376432023104,
+      "loss": 2.1447,
+      "step": 364
+    },
+    {
+      "epoch": 0.5498591549295775,
+      "grad_norm": 0.37968766689300537,
+      "learning_rate": 0.0002607627188871832,
+      "loss": 2.1502,
+      "step": 366
+    },
+    {
+      "epoch": 0.5528638497652583,
+      "grad_norm": 0.3841196894645691,
+      "learning_rate": 0.0002602589408834772,
+      "loss": 2.1263,
+      "step": 368
+    },
+    {
+      "epoch": 0.5558685446009389,
+      "grad_norm": 0.38847631216049194,
+      "learning_rate": 0.00025975244273756376,
+      "loss": 2.1662,
+      "step": 370
+    },
+    {
+      "epoch": 0.5588732394366197,
+      "grad_norm": 0.3703446686267853,
+      "learning_rate": 0.00025924323694500093,
+      "loss": 2.1735,
+      "step": 372
+    },
+    {
+      "epoch": 0.5618779342723005,
+      "grad_norm": 0.3932661712169647,
+      "learning_rate": 0.0002587313360681454,
+      "loss": 2.2347,
+      "step": 374
+    },
+    {
+      "epoch": 0.5648826291079813,
+      "grad_norm": 0.3762649893760681,
+      "learning_rate": 0.00025821675273584335,
+      "loss": 2.0728,
+      "step": 376
+    },
+    {
+      "epoch": 0.5678873239436619,
+      "grad_norm": 0.3761097490787506,
+      "learning_rate": 0.0002576994996431181,
+      "loss": 2.0566,
+      "step": 378
+    },
+    {
+      "epoch": 0.5708920187793427,
+      "grad_norm": 0.3672586679458618,
+      "learning_rate": 0.0002571795895508575,
+      "loss": 2.139,
+      "step": 380
+    },
+    {
+      "epoch": 0.5738967136150235,
+      "grad_norm": 0.4048523008823395,
+      "learning_rate": 0.0002566570352854988,
+      "loss": 2.0188,
+      "step": 382
+    },
+    {
+      "epoch": 0.5769014084507043,
+      "grad_norm": 0.3703201711177826,
+      "learning_rate": 0.0002561318497387122,
+      "loss": 2.1636,
+      "step": 384
+    },
+    {
+      "epoch": 0.5799061032863849,
+      "grad_norm": 0.3803885579109192,
+      "learning_rate": 0.0002556040458670831,
+      "loss": 2.1196,
+      "step": 386
+    },
+    {
+      "epoch": 0.5829107981220657,
+      "grad_norm": 0.37113627791404724,
+      "learning_rate": 0.0002550736366917921,
+      "loss": 2.0984,
+      "step": 388
+    },
+    {
+      "epoch": 0.5859154929577465,
+      "grad_norm": 0.39332741498947144,
+      "learning_rate": 0.00025454063529829405,
+      "loss": 2.1744,
+      "step": 390
+    },
+    {
+      "epoch": 0.5889201877934273,
+      "grad_norm": 0.36778101325035095,
+      "learning_rate": 0.00025400505483599487,
+      "loss": 2.2,
+      "step": 392
+    },
+    {
+      "epoch": 0.591924882629108,
+      "grad_norm": 0.37352946400642395,
+      "learning_rate": 0.0002534669085179277,
+      "loss": 2.0796,
+      "step": 394
+    },
+    {
+      "epoch": 0.5949295774647887,
+      "grad_norm": 0.3829871118068695,
+      "learning_rate": 0.0002529262096204264,
+      "loss": 2.1883,
+      "step": 396
+    },
+    {
+      "epoch": 0.5979342723004695,
+      "grad_norm": 0.37720027565956116,
+      "learning_rate": 0.0002523829714827981,
+      "loss": 2.1122,
+      "step": 398
+    },
+    {
+      "epoch": 0.6009389671361502,
+      "grad_norm": 0.38344496488571167,
+      "learning_rate": 0.00025183720750699453,
+      "loss": 2.1739,
+      "step": 400
+    },
+    {
+      "epoch": 0.6009389671361502,
+      "eval_loss": 2.0227649211883545,
+      "eval_runtime": 2.2306,
+      "eval_samples_per_second": 15.242,
+      "eval_steps_per_second": 1.345,
+      "step": 400
+    },
+    {
+      "epoch": 0.603943661971831,
+      "grad_norm": 0.3744402229785919,
+      "learning_rate": 0.00025128893115728096,
+      "loss": 2.082,
+      "step": 402
+    },
+    {
+      "epoch": 0.6069483568075117,
+      "grad_norm": 0.3665165305137634,
+      "learning_rate": 0.000250738155959904,
+      "loss": 2.125,
+      "step": 404
+    },
+    {
+      "epoch": 0.6099530516431925,
+      "grad_norm": 0.38589879870414734,
+      "learning_rate": 0.00025018489550275824,
+      "loss": 2.1454,
+      "step": 406
+    },
+    {
+      "epoch": 0.6129577464788732,
+      "grad_norm": 0.37511923909187317,
+      "learning_rate": 0.0002496291634350509,
+      "loss": 2.1677,
+      "step": 408
+    },
+    {
+      "epoch": 0.615962441314554,
+      "grad_norm": 0.39144960045814514,
+      "learning_rate": 0.0002490709734669648,
+      "loss": 2.1845,
+      "step": 410
+    },
+    {
+      "epoch": 0.6189671361502347,
+      "grad_norm": 0.3829379081726074,
+      "learning_rate": 0.0002485103393693207,
+      "loss": 2.1483,
+      "step": 412
+    },
+    {
+      "epoch": 0.6219718309859155,
+      "grad_norm": 0.37375521659851074,
+      "learning_rate": 0.0002479472749732369,
+      "loss": 2.1219,
+      "step": 414
+    },
+    {
+      "epoch": 0.6249765258215962,
+      "grad_norm": 0.3679542541503906,
+      "learning_rate": 0.00024738179416978844,
+      "loss": 2.1488,
+      "step": 416
+    },
+    {
+      "epoch": 0.627981220657277,
+      "grad_norm": 0.3810802698135376,
+      "learning_rate": 0.0002468139109096646,
+      "loss": 2.1408,
+      "step": 418
+    },
+    {
+      "epoch": 0.6309859154929578,
+      "grad_norm": 0.3585708737373352,
+      "learning_rate": 0.00024624363920282413,
+      "loss": 2.0933,
+      "step": 420
+    },
+    {
+      "epoch": 0.6339906103286385,
+      "grad_norm": 0.39489805698394775,
+      "learning_rate": 0.00024567099311815,
+      "loss": 2.1737,
+      "step": 422
+    },
+    {
+      "epoch": 0.6369953051643192,
+      "grad_norm": 0.35941746830940247,
+      "learning_rate": 0.0002450959867831024,
+      "loss": 2.1852,
+      "step": 424
+    },
+    {
+      "epoch": 0.64,
+      "grad_norm": 0.36616775393486023,
+      "learning_rate": 0.0002445186343833698,
+      "loss": 2.0507,
+      "step": 426
+    },
+    {
+      "epoch": 0.6430046948356808,
+      "grad_norm": 0.3893657624721527,
+      "learning_rate": 0.0002439389501625194,
+      "loss": 2.1827,
+      "step": 428
+    },
+    {
+      "epoch": 0.6460093896713615,
+      "grad_norm": 0.3760404586791992,
+      "learning_rate": 0.00024335694842164563,
+      "loss": 2.1185,
+      "step": 430
+    },
+    {
+      "epoch": 0.6490140845070422,
+      "grad_norm": 0.36795878410339355,
+      "learning_rate": 0.00024277264351901709,
+      "loss": 2.1081,
+      "step": 432
+    },
+    {
+      "epoch": 0.652018779342723,
+      "grad_norm": 0.3836841285228729,
+      "learning_rate": 0.00024218604986972267,
+      "loss": 2.1178,
+      "step": 434
+    },
+    {
+      "epoch": 0.6550234741784038,
+      "grad_norm": 0.36871835589408875,
+      "learning_rate": 0.00024159718194531572,
+      "loss": 2.1546,
+      "step": 436
+    },
+    {
+      "epoch": 0.6580281690140845,
+      "grad_norm": 0.37259966135025024,
+      "learning_rate": 0.00024100605427345703,
+      "loss": 2.0629,
+      "step": 438
+    },
+    {
+      "epoch": 0.6610328638497652,
+      "grad_norm": 0.3785538077354431,
+      "learning_rate": 0.00024041268143755646,
+      "loss": 2.0866,
+      "step": 440
+    },
+    {
+      "epoch": 0.664037558685446,
+      "grad_norm": 0.3723413646221161,
+      "learning_rate": 0.00023981707807641336,
+      "loss": 2.1177,
+      "step": 442
+    },
+    {
+      "epoch": 0.6670422535211268,
+      "grad_norm": 0.3624575436115265,
+      "learning_rate": 0.00023921925888385496,
+      "loss": 2.1133,
+      "step": 444
+    },
+    {
+      "epoch": 0.6700469483568076,
+      "grad_norm": 0.3712269067764282,
+      "learning_rate": 0.00023861923860837427,
+      "loss": 2.1705,
+      "step": 446
+    },
+    {
+      "epoch": 0.6730516431924882,
+      "grad_norm": 0.37567707896232605,
+      "learning_rate": 0.00023801703205276613,
+      "loss": 2.165,
+      "step": 448
+    },
+    {
+      "epoch": 0.676056338028169,
+      "grad_norm": 0.36910387873649597,
+      "learning_rate": 0.00023741265407376192,
+      "loss": 2.0883,
+      "step": 450
+    },
+    {
+      "epoch": 0.676056338028169,
+      "eval_loss": 2.003222942352295,
+      "eval_runtime": 2.2291,
+      "eval_samples_per_second": 15.253,
+      "eval_steps_per_second": 1.346,
+      "step": 450
+    },
+    {
+      "epoch": 0.6790610328638498,
+      "grad_norm": 0.3778355121612549,
+      "learning_rate": 0.00023680611958166312,
+      "loss": 2.1206,
+      "step": 452
+    },
+    {
+      "epoch": 0.6820657276995306,
+      "grad_norm": 0.3732253909111023,
+      "learning_rate": 0.00023619744353997347,
+      "loss": 2.12,
+      "step": 454
+    },
+    {
+      "epoch": 0.6850704225352112,
+      "grad_norm": 0.3719022572040558,
+      "learning_rate": 0.00023558664096502978,
+      "loss": 2.0978,
+      "step": 456
+    },
+    {
+      "epoch": 0.688075117370892,
+      "grad_norm": 0.38006502389907837,
+      "learning_rate": 0.00023497372692563143,
+      "loss": 2.128,
+      "step": 458
+    },
+    {
+      "epoch": 0.6910798122065728,
+      "grad_norm": 0.37536904215812683,
+      "learning_rate": 0.00023435871654266873,
+      "loss": 2.1125,
+      "step": 460
+    },
+    {
+      "epoch": 0.6940845070422536,
+      "grad_norm": 0.3833812475204468,
+      "learning_rate": 0.00023374162498874978,
+      "loss": 2.0893,
+      "step": 462
+    },
+    {
+      "epoch": 0.6970892018779342,
+      "grad_norm": 0.3699873387813568,
+      "learning_rate": 0.00023312246748782622,
+      "loss": 2.0756,
+      "step": 464
+    },
+    {
+      "epoch": 0.700093896713615,
+      "grad_norm": 0.4059182405471802,
+      "learning_rate": 0.0002325012593148176,
+      "loss": 2.1247,
+      "step": 466
+    },
+    {
+      "epoch": 0.7030985915492958,
+      "grad_norm": 0.39631035923957825,
+      "learning_rate": 0.00023187801579523446,
+      "loss": 2.1437,
+      "step": 468
+    },
+    {
+      "epoch": 0.7061032863849765,
+      "grad_norm": 0.38100457191467285,
+      "learning_rate": 0.00023125275230480056,
+      "loss": 2.0906,
+      "step": 470
+    },
+    {
+      "epoch": 0.7091079812206573,
+      "grad_norm": 0.3866036832332611,
+      "learning_rate": 0.0002306254842690732,
+      "loss": 2.1248,
+      "step": 472
+    },
+    {
+      "epoch": 0.712112676056338,
+      "grad_norm": 0.36718884110450745,
+      "learning_rate": 0.0002299962271630627,
+      "loss": 2.2143,
+      "step": 474
+    },
+    {
+      "epoch": 0.7151173708920188,
+      "grad_norm": 0.3714663088321686,
+      "learning_rate": 0.00022936499651085088,
+      "loss": 2.1471,
+      "step": 476
+    },
+    {
+      "epoch": 0.7181220657276995,
+      "grad_norm": 0.3928404748439789,
+      "learning_rate": 0.0002287318078852079,
+      "loss": 2.1203,
+      "step": 478
+    },
+    {
+      "epoch": 0.7211267605633803,
+      "grad_norm": 0.38624709844589233,
+      "learning_rate": 0.00022809667690720803,
+      "loss": 2.1655,
+      "step": 480
+    },
+    {
+      "epoch": 0.724131455399061,
+      "grad_norm": 0.356985479593277,
+      "learning_rate": 0.00022745961924584428,
+      "loss": 2.1478,
+      "step": 482
+    },
+    {
+      "epoch": 0.7271361502347418,
+      "grad_norm": 0.3587605059146881,
+      "learning_rate": 0.00022682065061764198,
+      "loss": 2.1355,
+      "step": 484
+    },
+    {
+      "epoch": 0.7301408450704225,
+      "grad_norm": 0.38690492510795593,
+      "learning_rate": 0.00022617978678627092,
+      "loss": 2.0885,
+      "step": 486
+    },
+    {
+      "epoch": 0.7331455399061033,
+      "grad_norm": 0.36528658866882324,
+      "learning_rate": 0.00022553704356215637,
+      "loss": 2.0842,
+      "step": 488
+    },
+    {
+      "epoch": 0.7361502347417841,
+      "grad_norm": 0.38041624426841736,
+      "learning_rate": 0.00022489243680208943,
+      "loss": 2.097,
+      "step": 490
+    },
+    {
+      "epoch": 0.7391549295774648,
+      "grad_norm": 0.3670444190502167,
+      "learning_rate": 0.0002242459824088353,
+      "loss": 2.0909,
+      "step": 492
+    },
+    {
+      "epoch": 0.7421596244131455,
+      "grad_norm": 0.39547091722488403,
+      "learning_rate": 0.00022359769633074122,
+      "loss": 2.0629,
+      "step": 494
+    },
+    {
+      "epoch": 0.7451643192488263,
+      "grad_norm": 0.39055827260017395,
+      "learning_rate": 0.00022294759456134304,
+      "loss": 2.14,
+      "step": 496
+    },
+    {
+      "epoch": 0.7481690140845071,
+      "grad_norm": 0.38088637590408325,
+      "learning_rate": 0.00022229569313897066,
+      "loss": 2.0859,
+      "step": 498
+    },
+    {
+      "epoch": 0.7511737089201878,
+      "grad_norm": 0.37030696868896484,
+      "learning_rate": 0.00022164200814635217,
+      "loss": 2.1438,
+      "step": 500
+    },
+    {
+      "epoch": 0.7511737089201878,
+      "eval_loss": 1.9595993757247925,
+      "eval_runtime": 2.2313,
+      "eval_samples_per_second": 15.238,
+      "eval_steps_per_second": 1.345,
+      "step": 500
+    },
+    {
+      "epoch": 0.7541784037558685,
+      "grad_norm": 0.3989259600639343,
+      "learning_rate": 0.00022098655571021735,
+      "loss": 2.1057,
+      "step": 502
+    },
+    {
+      "epoch": 0.7571830985915493,
+      "grad_norm": 0.38542553782463074,
+      "learning_rate": 0.00022032935200089958,
+      "loss": 2.1345,
+      "step": 504
+    },
+    {
+      "epoch": 0.7601877934272301,
+      "grad_norm": 0.3918050229549408,
+      "learning_rate": 0.00021967041323193707,
+      "loss": 2.1338,
+      "step": 506
+    },
+    {
+      "epoch": 0.7631924882629108,
+      "grad_norm": 0.4032893478870392,
+      "learning_rate": 0.0002190097556596728,
+      "loss": 2.1003,
+      "step": 508
+    },
+    {
+      "epoch": 0.7661971830985915,
+      "grad_norm": 0.38733917474746704,
+      "learning_rate": 0.00021834739558285342,
+      "loss": 2.1842,
+      "step": 510
+    },
+    {
+      "epoch": 0.7692018779342723,
+      "grad_norm": 0.37685641646385193,
+      "learning_rate": 0.00021768334934222725,
+      "loss": 2.0871,
+      "step": 512
+    },
+    {
+      "epoch": 0.7722065727699531,
+      "grad_norm": 0.3656997084617615,
+      "learning_rate": 0.00021701763332014103,
+      "loss": 2.0901,
+      "step": 514
+    },
+    {
+      "epoch": 0.7752112676056339,
+      "grad_norm": 0.38865405321121216,
+      "learning_rate": 0.00021635026394013602,
+      "loss": 2.1474,
+      "step": 516
+    },
+    {
+      "epoch": 0.7782159624413145,
+      "grad_norm": 0.3869773745536804,
+      "learning_rate": 0.00021568125766654236,
+      "loss": 2.121,
+      "step": 518
+    },
+    {
+      "epoch": 0.7812206572769953,
+      "grad_norm": 0.40487486124038696,
+      "learning_rate": 0.00021501063100407334,
+      "loss": 2.0802,
+      "step": 520
+    },
+    {
+      "epoch": 0.7842253521126761,
+      "grad_norm": 0.3867839574813843,
+      "learning_rate": 0.00021433840049741803,
+      "loss": 2.0966,
+      "step": 522
+    },
+    {
+      "epoch": 0.7872300469483569,
+      "grad_norm": 0.3910694420337677,
+      "learning_rate": 0.000213664582730833,
+      "loss": 2.1299,
+      "step": 524
+    },
+    {
+      "epoch": 0.7902347417840375,
+      "grad_norm": 0.38044509291648865,
+      "learning_rate": 0.00021298919432773347,
+      "loss": 2.0688,
+      "step": 526
+    },
+    {
+      "epoch": 0.7932394366197183,
+      "grad_norm": 0.4008919596672058,
+      "learning_rate": 0.00021231225195028297,
+      "loss": 2.0854,
+      "step": 528
+    },
+    {
+      "epoch": 0.7962441314553991,
+      "grad_norm": 0.36683255434036255,
+      "learning_rate": 0.00021163377229898225,
+      "loss": 2.0452,
+      "step": 530
+    },
+    {
+      "epoch": 0.7992488262910799,
+      "grad_norm": 0.37036463618278503,
+      "learning_rate": 0.0002109537721122574,
+      "loss": 2.1759,
+      "step": 532
+    },
+    {
+      "epoch": 0.8022535211267605,
+      "grad_norm": 0.38639795780181885,
+      "learning_rate": 0.00021027226816604702,
+      "loss": 2.031,
+      "step": 534
+    },
+    {
+      "epoch": 0.8052582159624413,
+      "grad_norm": 0.3881765305995941,
+      "learning_rate": 0.000209589277273388,
+      "loss": 2.0895,
+      "step": 536
+    },
+    {
+      "epoch": 0.8082629107981221,
+      "grad_norm": 0.37221747636795044,
+      "learning_rate": 0.00020890481628400097,
+      "loss": 2.0817,
+      "step": 538
+    },
+    {
+      "epoch": 0.8112676056338028,
+      "grad_norm": 0.3879324495792389,
+      "learning_rate": 0.00020821890208387467,
+      "loss": 2.1211,
+      "step": 540
+    },
+    {
+      "epoch": 0.8142723004694836,
+      "grad_norm": 0.3787749409675598,
+      "learning_rate": 0.0002075315515948492,
+      "loss": 2.0724,
+      "step": 542
+    },
+    {
+      "epoch": 0.8172769953051643,
+      "grad_norm": 0.3739571273326874,
+      "learning_rate": 0.00020684278177419854,
+      "loss": 2.0624,
+      "step": 544
+    },
+    {
+      "epoch": 0.8202816901408451,
+      "grad_norm": 0.3751803934574127,
+      "learning_rate": 0.00020615260961421238,
+      "loss": 2.1127,
+      "step": 546
+    },
+    {
+      "epoch": 0.8232863849765258,
+      "grad_norm": 0.38204455375671387,
+      "learning_rate": 0.00020546105214177678,
+      "loss": 2.1193,
+      "step": 548
+    },
+    {
+      "epoch": 0.8262910798122066,
+      "grad_norm": 0.3972371220588684,
+      "learning_rate": 0.00020476812641795407,
+      "loss": 2.1457,
+      "step": 550
+    },
+    {
+      "epoch": 0.8262910798122066,
+      "eval_loss": 1.9604527950286865,
+      "eval_runtime": 2.2284,
+      "eval_samples_per_second": 15.257,
+      "eval_steps_per_second": 1.346,
+      "step": 550
+    },
+    {
+      "epoch": 0.8292957746478873,
+      "grad_norm": 0.3509877026081085,
+      "learning_rate": 0.00020407384953756216,
+      "loss": 1.9992,
+      "step": 552
+    },
+    {
+      "epoch": 0.8323004694835681,
+      "grad_norm": 0.3898080587387085,
+      "learning_rate": 0.00020337823862875257,
+      "loss": 2.1867,
+      "step": 554
+    },
+    {
+      "epoch": 0.8353051643192488,
+      "grad_norm": 0.3941861093044281,
+      "learning_rate": 0.00020268131085258789,
+      "loss": 2.1109,
+      "step": 556
+    },
+    {
+      "epoch": 0.8383098591549296,
+      "grad_norm": 0.3842771351337433,
+      "learning_rate": 0.00020198308340261859,
+      "loss": 2.0762,
+      "step": 558
+    },
+    {
+      "epoch": 0.8413145539906103,
+      "grad_norm": 0.36597883701324463,
+      "learning_rate": 0.00020128357350445868,
+      "loss": 2.104,
+      "step": 560
+    },
+    {
+      "epoch": 0.8443192488262911,
+      "grad_norm": 0.3964192867279053,
+      "learning_rate": 0.00020058279841536075,
+      "loss": 2.0424,
+      "step": 562
+    },
+    {
+      "epoch": 0.8473239436619718,
+      "grad_norm": 0.38064396381378174,
+      "learning_rate": 0.00019988077542379033,
+      "loss": 2.0894,
+      "step": 564
+    },
+    {
+      "epoch": 0.8503286384976526,
+      "grad_norm": 0.3533835709095001,
+      "learning_rate": 0.00019917752184899938,
+      "loss": 2.1536,
+      "step": 566
+    },
+    {
+      "epoch": 0.8533333333333334,
+      "grad_norm": 0.3970707654953003,
+      "learning_rate": 0.00019847305504059888,
+      "loss": 2.1448,
+      "step": 568
+    },
+    {
+      "epoch": 0.856338028169014,
+      "grad_norm": 0.3679479956626892,
+      "learning_rate": 0.00019776739237813073,
+      "loss": 2.1038,
+      "step": 570
+    },
+    {
+      "epoch": 0.8593427230046948,
+      "grad_norm": 0.399521142244339,
+      "learning_rate": 0.00019706055127063942,
+      "loss": 2.1938,
+      "step": 572
+    },
+    {
+      "epoch": 0.8623474178403756,
+      "grad_norm": 0.36628666520118713,
+      "learning_rate": 0.0001963525491562421,
+      "loss": 2.0785,
+      "step": 574
+    },
+    {
+      "epoch": 0.8653521126760564,
+      "grad_norm": 0.38675469160079956,
+      "learning_rate": 0.00019564340350169854,
+      "loss": 2.1281,
+      "step": 576
+    },
+    {
+      "epoch": 0.868356807511737,
+      "grad_norm": 0.35931551456451416,
+      "learning_rate": 0.00019493313180198022,
+      "loss": 2.1402,
+      "step": 578
+    },
+    {
+      "epoch": 0.8713615023474178,
+      "grad_norm": 0.38222363591194153,
+      "learning_rate": 0.0001942217515798387,
+      "loss": 2.0282,
+      "step": 580
+    },
+    {
+      "epoch": 0.8743661971830986,
+      "grad_norm": 0.387198805809021,
+      "learning_rate": 0.00019350928038537336,
+      "loss": 2.1157,
+      "step": 582
+    },
+    {
+      "epoch": 0.8773708920187794,
+      "grad_norm": 0.3618201017379761,
+      "learning_rate": 0.00019279573579559836,
+      "loss": 2.0658,
+      "step": 584
+    },
+    {
+      "epoch": 0.88037558685446,
+      "grad_norm": 0.36946403980255127,
+      "learning_rate": 0.0001920811354140091,
+      "loss": 2.1084,
+      "step": 586
+    },
+    {
+      "epoch": 0.8833802816901408,
+      "grad_norm": 0.3920946717262268,
+      "learning_rate": 0.0001913654968701478,
+      "loss": 2.1193,
+      "step": 588
+    },
+    {
+      "epoch": 0.8863849765258216,
+      "grad_norm": 0.3728853166103363,
+      "learning_rate": 0.00019064883781916877,
+      "loss": 2.0556,
+      "step": 590
+    },
+    {
+      "epoch": 0.8893896713615024,
+      "grad_norm": 0.37712472677230835,
+      "learning_rate": 0.00018993117594140262,
+      "loss": 2.0859,
+      "step": 592
+    },
+    {
+      "epoch": 0.8923943661971832,
+      "grad_norm": 0.37663349509239197,
+      "learning_rate": 0.00018921252894192028,
+      "loss": 2.0963,
+      "step": 594
+    },
+    {
+      "epoch": 0.8953990610328638,
+      "grad_norm": 0.3913547694683075,
+      "learning_rate": 0.00018849291455009604,
+      "loss": 2.1573,
+      "step": 596
+    },
+    {
+      "epoch": 0.8984037558685446,
+      "grad_norm": 0.37650638818740845,
+      "learning_rate": 0.00018777235051917025,
+      "loss": 2.015,
+      "step": 598
+    },
+    {
+      "epoch": 0.9014084507042254,
+      "grad_norm": 0.39134547114372253,
+      "learning_rate": 0.00018705085462581146,
+      "loss": 2.0742,
+      "step": 600
+    },
+    {
+      "epoch": 0.9014084507042254,
+      "eval_loss": 1.9470399618148804,
+      "eval_runtime": 2.2316,
+      "eval_samples_per_second": 15.236,
+      "eval_steps_per_second": 1.344,
+      "step": 600
+    },
+    {
+      "epoch": 0.9044131455399061,
+      "grad_norm": 0.37611451745033264,
+      "learning_rate": 0.00018632844466967744,
+      "loss": 2.0821,
+      "step": 602
+    },
+    {
+      "epoch": 0.9074178403755868,
+      "grad_norm": 0.3889780342578888,
+      "learning_rate": 0.00018560513847297664,
+      "loss": 2.065,
+      "step": 604
+    },
+    {
+      "epoch": 0.9104225352112676,
+      "grad_norm": 0.39513203501701355,
+      "learning_rate": 0.00018488095388002798,
+      "loss": 2.048,
+      "step": 606
+    },
+    {
+      "epoch": 0.9134272300469484,
+      "grad_norm": 0.37584614753723145,
+      "learning_rate": 0.00018415590875682093,
+      "loss": 2.0756,
+      "step": 608
+    },
+    {
+      "epoch": 0.9164319248826291,
+      "grad_norm": 0.37283995747566223,
+      "learning_rate": 0.00018343002099057475,
+      "loss": 2.0676,
+      "step": 610
+    },
+    {
+      "epoch": 0.9194366197183098,
+      "grad_norm": 0.3875684142112732,
+      "learning_rate": 0.00018270330848929698,
+      "loss": 2.0659,
+      "step": 612
+    },
+    {
+      "epoch": 0.9224413145539906,
+      "grad_norm": 0.3838404715061188,
+      "learning_rate": 0.0001819757891813418,
+      "loss": 2.0648,
+      "step": 614
+    },
+    {
+      "epoch": 0.9254460093896714,
+      "grad_norm": 0.37943315505981445,
+      "learning_rate": 0.00018124748101496784,
+      "loss": 2.07,
+      "step": 616
+    },
+    {
+      "epoch": 0.9284507042253521,
+      "grad_norm": 0.38859301805496216,
+      "learning_rate": 0.00018051840195789506,
+      "loss": 1.9994,
+      "step": 618
+    },
+    {
+      "epoch": 0.9314553990610329,
+      "grad_norm": 0.3987753987312317,
+      "learning_rate": 0.0001797885699968618,
+      "loss": 2.0957,
+      "step": 620
+    },
+    {
+      "epoch": 0.9344600938967136,
+      "grad_norm": 0.3856673240661621,
+      "learning_rate": 0.0001790580031371809,
+      "loss": 2.1199,
+      "step": 622
+    },
+    {
+      "epoch": 0.9374647887323944,
+      "grad_norm": 0.39979681372642517,
+      "learning_rate": 0.00017832671940229547,
+      "loss": 2.0303,
+      "step": 624
+    },
+    {
+      "epoch": 0.9404694835680751,
+      "grad_norm": 0.3718721270561218,
+      "learning_rate": 0.00017759473683333428,
+      "loss": 2.089,
+      "step": 626
+    },
+    {
+      "epoch": 0.9434741784037559,
+      "grad_norm": 0.38398197293281555,
+      "learning_rate": 0.00017686207348866675,
+      "loss": 2.0357,
+      "step": 628
+    },
+    {
+      "epoch": 0.9464788732394366,
+      "grad_norm": 0.3931136727333069,
+      "learning_rate": 0.00017612874744345728,
+      "loss": 1.9967,
+      "step": 630
+    },
+    {
+      "epoch": 0.9494835680751174,
+      "grad_norm": 0.37366238236427307,
+      "learning_rate": 0.00017539477678921945,
+      "loss": 2.0203,
+      "step": 632
+    },
+    {
+      "epoch": 0.9524882629107981,
+      "grad_norm": 0.3974733352661133,
+      "learning_rate": 0.00017466017963336971,
+      "loss": 1.9708,
+      "step": 634
+    },
+    {
+      "epoch": 0.9554929577464789,
+      "grad_norm": 0.3970566391944885,
+      "learning_rate": 0.00017392497409878058,
+      "loss": 2.1199,
+      "step": 636
+    },
+    {
+      "epoch": 0.9584976525821596,
+      "grad_norm": 0.3735021948814392,
+      "learning_rate": 0.00017318917832333353,
+      "loss": 2.067,
+      "step": 638
+    },
+    {
+      "epoch": 0.9615023474178404,
+      "grad_norm": 0.3791767954826355,
+      "learning_rate": 0.00017245281045947164,
+      "loss": 2.0029,
+      "step": 640
+    },
+    {
+      "epoch": 0.9645070422535211,
+      "grad_norm": 0.37752142548561096,
+      "learning_rate": 0.00017171588867375166,
+      "loss": 2.0911,
+      "step": 642
+    },
+    {
+      "epoch": 0.9675117370892019,
+      "grad_norm": 0.38177889585494995,
+      "learning_rate": 0.0001709784311463958,
+      "loss": 2.0631,
+      "step": 644
+    },
+    {
+      "epoch": 0.9705164319248827,
+      "grad_norm": 0.3695722222328186,
+      "learning_rate": 0.00017024045607084344,
+      "loss": 2.0545,
+      "step": 646
+    },
+    {
+      "epoch": 0.9735211267605633,
+      "grad_norm": 0.39020925760269165,
+      "learning_rate": 0.00016950198165330198,
+      "loss": 2.1177,
+      "step": 648
+    },
+    {
+      "epoch": 0.9765258215962441,
+      "grad_norm": 0.3784434497356415,
+      "learning_rate": 0.00016876302611229792,
+      "loss": 2.0058,
+      "step": 650
+    },
+    {
+      "epoch": 0.9765258215962441,
+      "eval_loss": 1.91607666015625,
+      "eval_runtime": 2.2319,
+      "eval_samples_per_second": 15.234,
+      "eval_steps_per_second": 1.344,
+      "step": 650
+    },
+    {
+      "epoch": 0.9795305164319249,
+      "grad_norm": 0.3931000828742981,
+      "learning_rate": 0.00016802360767822718,
+      "loss": 2.0495,
+      "step": 652
+    },
+    {
+      "epoch": 0.9825352112676057,
+      "grad_norm": 0.3686800003051758,
+      "learning_rate": 0.0001672837445929057,
+      "loss": 1.9978,
+      "step": 654
+    },
+    {
+      "epoch": 0.9855399061032863,
+      "grad_norm": 0.37565842270851135,
+      "learning_rate": 0.00016654345510911896,
+      "loss": 2.0883,
+      "step": 656
+    },
+    {
+      "epoch": 0.9885446009389671,
+      "grad_norm": 0.38410088419914246,
+      "learning_rate": 0.00016580275749017204,
+      "loss": 2.0835,
+      "step": 658
+    },
+    {
+      "epoch": 0.9915492957746479,
+      "grad_norm": 0.37980878353118896,
+      "learning_rate": 0.0001650616700094389,
+      "loss": 2.0942,
+      "step": 660
+    },
+    {
+      "epoch": 0.9945539906103287,
+      "grad_norm": 0.36653536558151245,
+      "learning_rate": 0.0001643202109499115,
+      "loss": 2.1472,
+      "step": 662
+    },
+    {
+      "epoch": 0.9975586854460093,
+      "grad_norm": 0.36443179845809937,
+      "learning_rate": 0.0001635783986037489,
+      "loss": 2.1674,
+      "step": 664
+    },
+    {
+      "epoch": 1.0,
+      "grad_norm": 0.45171236991882324,
+      "learning_rate": 0.00016283625127182596,
+      "loss": 2.0493,
+      "step": 666
+    },
+    {
+      "epoch": 1.0030046948356808,
+      "grad_norm": 0.365464448928833,
+      "learning_rate": 0.00016209378726328167,
+      "loss": 1.997,
+      "step": 668
+    },
+    {
+      "epoch": 1.0060093896713616,
+      "grad_norm": 0.4012795686721802,
+      "learning_rate": 0.00016135102489506768,
+      "loss": 2.0016,
+      "step": 670
+    },
+    {
+      "epoch": 1.0090140845070423,
+      "grad_norm": 0.41814184188842773,
+      "learning_rate": 0.00016060798249149628,
+      "loss": 1.9394,
+      "step": 672
+    },
+    {
+      "epoch": 1.012018779342723,
+      "grad_norm": 0.3885805308818817,
+      "learning_rate": 0.00015986467838378847,
+      "loss": 2.0091,
+      "step": 674
+    },
+    {
+      "epoch": 1.0150234741784037,
+      "grad_norm": 0.41345134377479553,
+      "learning_rate": 0.00015912113090962146,
+      "loss": 1.9516,
+      "step": 676
+    },
+    {
+      "epoch": 1.0180281690140844,
+      "grad_norm": 0.4025941789150238,
+      "learning_rate": 0.0001583773584126766,
+      "loss": 2.0444,
+      "step": 678
+    },
+    {
+      "epoch": 1.0210328638497652,
+      "grad_norm": 0.39452236890792847,
+      "learning_rate": 0.0001576333792421865,
+      "loss": 1.9332,
+      "step": 680
+    },
+    {
+      "epoch": 1.024037558685446,
+      "grad_norm": 0.40509164333343506,
+      "learning_rate": 0.00015688921175248253,
+      "loss": 1.9198,
+      "step": 682
+    },
+    {
+      "epoch": 1.0270422535211268,
+      "grad_norm": 0.3902369439601898,
+      "learning_rate": 0.00015614487430254214,
+      "loss": 1.9223,
+      "step": 684
+    },
+    {
+      "epoch": 1.0300469483568075,
+      "grad_norm": 0.40739771723747253,
+      "learning_rate": 0.00015540038525553563,
+      "loss": 1.9946,
+      "step": 686
+    },
+    {
+      "epoch": 1.0330516431924883,
+      "grad_norm": 0.38803932070732117,
+      "learning_rate": 0.00015465576297837334,
+      "loss": 2.0214,
+      "step": 688
+    },
+    {
+      "epoch": 1.036056338028169,
+      "grad_norm": 0.40106263756752014,
+      "learning_rate": 0.0001539110258412525,
+      "loss": 1.9985,
+      "step": 690
+    },
+    {
+      "epoch": 1.0390610328638497,
+      "grad_norm": 0.39057457447052,
+      "learning_rate": 0.00015316619221720387,
+      "loss": 1.923,
+      "step": 692
+    },
+    {
+      "epoch": 1.0420657276995304,
+      "grad_norm": 0.40112224221229553,
+      "learning_rate": 0.00015242128048163864,
+      "loss": 2.0371,
+      "step": 694
+    },
+    {
+      "epoch": 1.0450704225352112,
+      "grad_norm": 0.38916274905204773,
+      "learning_rate": 0.00015167630901189512,
+      "loss": 1.992,
+      "step": 696
+    },
+    {
+      "epoch": 1.048075117370892,
+      "grad_norm": 0.38228753209114075,
+      "learning_rate": 0.00015093129618678526,
+      "loss": 2.0065,
+      "step": 698
+    },
+    {
+      "epoch": 1.0510798122065728,
+      "grad_norm": 0.3931009769439697,
+      "learning_rate": 0.0001501862603861412,
+      "loss": 2.0016,
+      "step": 700
+    },
+    {
+      "epoch": 1.0510798122065728,
+      "eval_loss": 1.8976900577545166,
+      "eval_runtime": 2.2303,
+      "eval_samples_per_second": 15.244,
+      "eval_steps_per_second": 1.345,
+      "step": 700
+    },
+    {
+      "epoch": 1.0540845070422535,
+      "grad_norm": 0.38756102323532104,
+      "learning_rate": 0.00014944121999036194,
+      "loss": 2.0403,
+      "step": 702
+    },
+    {
+      "epoch": 1.0570892018779343,
+      "grad_norm": 0.3809608817100525,
+      "learning_rate": 0.00014869619337995991,
+      "loss": 1.9654,
+      "step": 704
+    },
+    {
+      "epoch": 1.060093896713615,
+      "grad_norm": 0.3908007740974426,
+      "learning_rate": 0.00014795119893510735,
+      "loss": 2.0678,
+      "step": 706
+    },
+    {
+      "epoch": 1.0630985915492959,
+      "grad_norm": 0.37967926263809204,
+      "learning_rate": 0.00014720625503518298,
+      "loss": 1.9406,
+      "step": 708
+    },
+    {
+      "epoch": 1.0661032863849764,
+      "grad_norm": 0.39903658628463745,
+      "learning_rate": 0.0001464613800583186,
+      "loss": 1.9143,
+      "step": 710
+    },
+    {
+      "epoch": 1.0691079812206572,
+      "grad_norm": 0.39886537194252014,
+      "learning_rate": 0.00014571659238094556,
+      "loss": 2.0404,
+      "step": 712
+    },
+    {
+      "epoch": 1.072112676056338,
+      "grad_norm": 0.3941059410572052,
+      "learning_rate": 0.0001449719103773416,
+      "loss": 1.9827,
+      "step": 714
+    },
+    {
+      "epoch": 1.0751173708920188,
+      "grad_norm": 0.405575692653656,
+      "learning_rate": 0.00014422735241917736,
+      "loss": 2.037,
+      "step": 716
+    },
+    {
+      "epoch": 1.0781220657276995,
+      "grad_norm": 0.4031895101070404,
+      "learning_rate": 0.0001434829368750633,
+      "loss": 2.0518,
+      "step": 718
+    },
+    {
+      "epoch": 1.0811267605633803,
+      "grad_norm": 0.38887569308280945,
+      "learning_rate": 0.00014273868211009624,
+      "loss": 1.9411,
+      "step": 720
+    },
+    {
+      "epoch": 1.084131455399061,
+      "grad_norm": 0.39133763313293457,
+      "learning_rate": 0.0001419946064854068,
+      "loss": 2.0017,
+      "step": 722
+    },
+    {
+      "epoch": 1.0871361502347419,
+      "grad_norm": 0.39030545949935913,
+      "learning_rate": 0.00014125072835770595,
+      "loss": 2.0288,
+      "step": 724
+    },
+    {
+      "epoch": 1.0901408450704226,
+      "grad_norm": 0.37747108936309814,
+      "learning_rate": 0.00014050706607883227,
+      "loss": 1.9886,
+      "step": 726
+    },
+    {
+      "epoch": 1.0931455399061032,
+      "grad_norm": 0.3952395021915436,
+      "learning_rate": 0.00013976363799529936,
+      "loss": 1.9998,
+      "step": 728
+    },
+    {
+      "epoch": 1.096150234741784,
+      "grad_norm": 0.40994882583618164,
+      "learning_rate": 0.00013902046244784305,
+      "loss": 1.996,
+      "step": 730
+    },
+    {
+      "epoch": 1.0991549295774647,
+      "grad_norm": 0.3839206397533417,
+      "learning_rate": 0.00013827755777096892,
+      "loss": 1.9978,
+      "step": 732
+    },
+    {
+      "epoch": 1.1021596244131455,
+      "grad_norm": 0.39340534806251526,
+      "learning_rate": 0.0001375349422925002,
+      "loss": 2.0202,
+      "step": 734
+    },
+    {
+      "epoch": 1.1051643192488263,
+      "grad_norm": 0.4123769700527191,
+      "learning_rate": 0.00013679263433312533,
+      "loss": 1.9633,
+      "step": 736
+    },
+    {
+      "epoch": 1.108169014084507,
+      "grad_norm": 0.3920431137084961,
+      "learning_rate": 0.00013605065220594604,
+      "loss": 1.9387,
+      "step": 738
+    },
+    {
+      "epoch": 1.1111737089201879,
+      "grad_norm": 0.3928292691707611,
+      "learning_rate": 0.00013530901421602586,
+      "loss": 1.988,
+      "step": 740
+    },
+    {
+      "epoch": 1.1141784037558686,
+      "grad_norm": 0.39087823033332825,
+      "learning_rate": 0.00013456773865993808,
+      "loss": 1.9796,
+      "step": 742
+    },
+    {
+      "epoch": 1.1171830985915494,
+      "grad_norm": 0.41001343727111816,
+      "learning_rate": 0.0001338268438253146,
+      "loss": 2.042,
+      "step": 744
+    },
+    {
+      "epoch": 1.12018779342723,
+      "grad_norm": 0.3934822976589203,
+      "learning_rate": 0.00013308634799039478,
+      "loss": 1.9797,
+      "step": 746
+    },
+    {
+      "epoch": 1.1231924882629107,
+      "grad_norm": 0.3972378671169281,
+      "learning_rate": 0.00013234626942357447,
+      "loss": 1.8818,
+      "step": 748
+    },
+    {
+      "epoch": 1.1261971830985915,
+      "grad_norm": 0.3940439820289612,
+      "learning_rate": 0.00013160662638295526,
+      "loss": 2.0144,
+      "step": 750
+    },
+    {
+      "epoch": 1.1261971830985915,
+      "eval_loss": 1.8840441703796387,
+      "eval_runtime": 2.2298,
+      "eval_samples_per_second": 15.248,
+      "eval_steps_per_second": 1.345,
+      "step": 750
+    },
+    {
+      "epoch": 1.1292018779342723,
+      "grad_norm": 0.3927149176597595,
+      "learning_rate": 0.00013086743711589405,
+      "loss": 2.0484,
+      "step": 752
+    },
+    {
+      "epoch": 1.132206572769953,
+      "grad_norm": 0.39765027165412903,
+      "learning_rate": 0.0001301287198585531,
+      "loss": 2.0222,
+      "step": 754
+    },
+    {
+      "epoch": 1.1352112676056338,
+      "grad_norm": 0.4069582521915436,
+      "learning_rate": 0.00012939049283544978,
+      "loss": 1.9525,
+      "step": 756
+    },
+    {
+      "epoch": 1.1382159624413146,
+      "grad_norm": 0.40190741419792175,
+      "learning_rate": 0.00012865277425900724,
+      "loss": 1.9164,
+      "step": 758
+    },
+    {
+      "epoch": 1.1412206572769954,
+      "grad_norm": 0.3975595533847809,
+      "learning_rate": 0.000127915582329105,
+      "loss": 1.9604,
+      "step": 760
+    },
+    {
+      "epoch": 1.144225352112676,
+      "grad_norm": 0.41358593106269836,
+      "learning_rate": 0.0001271789352326298,
+      "loss": 1.936,
+      "step": 762
+    },
+    {
+      "epoch": 1.1472300469483567,
+      "grad_norm": 0.40639400482177734,
+      "learning_rate": 0.00012644285114302736,
+      "loss": 2.0311,
+      "step": 764
+    },
+    {
+      "epoch": 1.1502347417840375,
+      "grad_norm": 0.40582284331321716,
+      "learning_rate": 0.00012570734821985347,
+      "loss": 2.0382,
+      "step": 766
+    },
+    {
+      "epoch": 1.1532394366197183,
+      "grad_norm": 0.4027571976184845,
+      "learning_rate": 0.00012497244460832641,
+      "loss": 1.9412,
+      "step": 768
+    },
+    {
+      "epoch": 1.156244131455399,
+      "grad_norm": 0.3900875151157379,
+      "learning_rate": 0.00012423815843887913,
+      "loss": 2.0115,
+      "step": 770
+    },
+    {
+      "epoch": 1.1592488262910798,
+      "grad_norm": 0.39341413974761963,
+      "learning_rate": 0.0001235045078267119,
+      "loss": 1.982,
+      "step": 772
+    },
+    {
+      "epoch": 1.1622535211267606,
+      "grad_norm": 0.39579978585243225,
+      "learning_rate": 0.00012277151087134556,
+      "loss": 1.917,
+      "step": 774
+    },
+    {
+      "epoch": 1.1652582159624414,
+      "grad_norm": 0.3882204294204712,
+      "learning_rate": 0.00012203918565617487,
+      "loss": 1.9636,
+      "step": 776
+    },
+    {
+      "epoch": 1.1682629107981222,
+      "grad_norm": 0.4042316675186157,
+      "learning_rate": 0.00012130755024802252,
+      "loss": 2.0353,
+      "step": 778
+    },
+    {
+      "epoch": 1.1712676056338027,
+      "grad_norm": 0.40069130063056946,
+      "learning_rate": 0.00012057662269669318,
+      "loss": 1.9377,
+      "step": 780
+    },
+    {
+      "epoch": 1.1742723004694835,
+      "grad_norm": 0.40316134691238403,
+      "learning_rate": 0.00011984642103452841,
+      "loss": 1.9763,
+      "step": 782
+    },
+    {
+      "epoch": 1.1772769953051643,
+      "grad_norm": 0.4143735468387604,
+      "learning_rate": 0.00011911696327596183,
+      "loss": 1.9874,
+      "step": 784
+    },
+    {
+      "epoch": 1.180281690140845,
+      "grad_norm": 0.4170506000518799,
+      "learning_rate": 0.00011838826741707434,
+      "loss": 1.9869,
+      "step": 786
+    },
+    {
+      "epoch": 1.1832863849765258,
+      "grad_norm": 0.3935900032520294,
+      "learning_rate": 0.00011766035143515075,
+      "loss": 1.9594,
+      "step": 788
+    },
+    {
+      "epoch": 1.1862910798122066,
+      "grad_norm": 0.3949354290962219,
+      "learning_rate": 0.0001169332332882356,
+      "loss": 1.9516,
+      "step": 790
+    },
+    {
+      "epoch": 1.1892957746478874,
+      "grad_norm": 0.40487349033355713,
+      "learning_rate": 0.00011620693091469065,
+      "loss": 1.9753,
+      "step": 792
+    },
+    {
+      "epoch": 1.1923004694835682,
+      "grad_norm": 0.4178490936756134,
+      "learning_rate": 0.00011548146223275205,
+      "loss": 1.9727,
+      "step": 794
+    },
+    {
+      "epoch": 1.1953051643192487,
+      "grad_norm": 0.38711851835250854,
+      "learning_rate": 0.00011475684514008831,
+      "loss": 1.971,
+      "step": 796
+    },
+    {
+      "epoch": 1.1983098591549295,
+      "grad_norm": 0.3943468928337097,
+      "learning_rate": 0.00011403309751335898,
+      "loss": 1.998,
+      "step": 798
+    },
+    {
+      "epoch": 1.2013145539906103,
+      "grad_norm": 0.403125524520874,
+      "learning_rate": 0.0001133102372077733,
+      "loss": 2.0383,
+      "step": 800
+    },
+    {
+      "epoch": 1.2013145539906103,
+      "eval_loss": 1.857165813446045,
+      "eval_runtime": 2.2303,
+      "eval_samples_per_second": 15.245,
+      "eval_steps_per_second": 1.345,
+      "step": 800
+    },
+    {
+      "epoch": 1.204319248826291,
+      "grad_norm": 0.4021073877811432,
+      "learning_rate": 0.00011258828205664994,
+      "loss": 1.9776,
+      "step": 802
+    },
+    {
+      "epoch": 1.2073239436619718,
+      "grad_norm": 0.411634624004364,
+      "learning_rate": 0.00011186724987097698,
+      "loss": 1.9153,
+      "step": 804
+    },
+    {
+      "epoch": 1.2103286384976526,
+      "grad_norm": 0.38922935724258423,
+      "learning_rate": 0.00011114715843897243,
+      "loss": 1.9365,
+      "step": 806
+    },
+    {
+      "epoch": 1.2133333333333334,
+      "grad_norm": 0.39284008741378784,
+      "learning_rate": 0.00011042802552564543,
+      "loss": 1.952,
+      "step": 808
+    },
+    {
+      "epoch": 1.2163380281690142,
+      "grad_norm": 0.39748966693878174,
+      "learning_rate": 0.00010970986887235808,
+      "loss": 1.9262,
+      "step": 810
+    },
+    {
+      "epoch": 1.219342723004695,
+      "grad_norm": 0.3940993845462799,
+      "learning_rate": 0.00010899270619638768,
+      "loss": 2.0065,
+      "step": 812
+    },
+    {
+      "epoch": 1.2223474178403757,
+      "grad_norm": 0.3971916735172272,
+      "learning_rate": 0.00010827655519048951,
+      "loss": 1.9999,
+      "step": 814
+    },
+    {
+      "epoch": 1.2253521126760563,
+      "grad_norm": 0.39418497681617737,
+      "learning_rate": 0.00010756143352246047,
+      "loss": 2.0626,
+      "step": 816
+    },
+    {
+      "epoch": 1.228356807511737,
+      "grad_norm": 0.40934550762176514,
+      "learning_rate": 0.00010684735883470331,
+      "loss": 1.9853,
+      "step": 818
+    },
+    {
+      "epoch": 1.2313615023474178,
+      "grad_norm": 0.3775435984134674,
+      "learning_rate": 0.00010613434874379113,
+      "loss": 2.0527,
+      "step": 820
+    },
+    {
+      "epoch": 1.2343661971830986,
+      "grad_norm": 0.40113934874534607,
+      "learning_rate": 0.00010542242084003294,
+      "loss": 1.9745,
+      "step": 822
+    },
+    {
+      "epoch": 1.2373708920187794,
+      "grad_norm": 0.40606534481048584,
+      "learning_rate": 0.00010471159268703971,
+      "loss": 1.9655,
+      "step": 824
+    },
+    {
+      "epoch": 1.2403755868544601,
+      "grad_norm": 0.4148097634315491,
+      "learning_rate": 0.00010400188182129094,
+      "loss": 1.9503,
+      "step": 826
+    },
+    {
+      "epoch": 1.243380281690141,
+      "grad_norm": 0.39626210927963257,
+      "learning_rate": 0.0001032933057517022,
+      "loss": 1.8861,
+      "step": 828
+    },
+    {
+      "epoch": 1.2463849765258215,
+      "grad_norm": 0.41403472423553467,
+      "learning_rate": 0.000102585881959193,
+      "loss": 1.9898,
+      "step": 830
+    },
+    {
+      "epoch": 1.2493896713615023,
+      "grad_norm": 0.3943893015384674,
+      "learning_rate": 0.00010187962789625561,
+      "loss": 1.9193,
+      "step": 832
+    },
+    {
+      "epoch": 1.252394366197183,
+      "grad_norm": 0.3905406594276428,
+      "learning_rate": 0.0001011745609865246,
+      "loss": 1.9947,
+      "step": 834
+    },
+    {
+      "epoch": 1.2553990610328638,
+      "grad_norm": 0.4018987715244293,
+      "learning_rate": 0.00010047069862434668,
+      "loss": 1.9247,
+      "step": 836
+    },
+    {
+      "epoch": 1.2584037558685446,
+      "grad_norm": 0.39905279874801636,
+      "learning_rate": 9.976805817435207e-05,
+      "loss": 1.9404,
+      "step": 838
+    },
+    {
+      "epoch": 1.2614084507042254,
+      "grad_norm": 0.399989515542984,
+      "learning_rate": 9.906665697102556e-05,
+      "loss": 2.0097,
+      "step": 840
+    },
+    {
+      "epoch": 1.2644131455399061,
+      "grad_norm": 0.40526479482650757,
+      "learning_rate": 9.836651231827927e-05,
+      "loss": 1.9914,
+      "step": 842
+    },
+    {
+      "epoch": 1.267417840375587,
+      "grad_norm": 0.39594128727912903,
+      "learning_rate": 9.766764148902554e-05,
+      "loss": 1.9343,
+      "step": 844
+    },
+    {
+      "epoch": 1.2704225352112677,
+      "grad_norm": 0.41465577483177185,
+      "learning_rate": 9.69700617247508e-05,
+      "loss": 2.0117,
+      "step": 846
+    },
+    {
+      "epoch": 1.2734272300469485,
+      "grad_norm": 0.3978896737098694,
+      "learning_rate": 9.627379023509041e-05,
+      "loss": 1.9999,
+      "step": 848
+    },
+    {
+      "epoch": 1.2764319248826292,
+      "grad_norm": 0.408538281917572,
+      "learning_rate": 9.557884419740386e-05,
+      "loss": 1.9937,
+      "step": 850
+    },
+    {
+      "epoch": 1.2764319248826292,
+      "eval_loss": 1.8352789878845215,
+      "eval_runtime": 2.2335,
+      "eval_samples_per_second": 15.223,
+      "eval_steps_per_second": 1.343,
+      "step": 850
+    },
+    {
+      "epoch": 1.2794366197183098,
+      "grad_norm": 0.40342220664024353,
+      "learning_rate": 9.488524075635109e-05,
+      "loss": 2.0279,
+      "step": 852
+    },
+    {
+      "epoch": 1.2824413145539906,
+      "grad_norm": 0.39497530460357666,
+      "learning_rate": 9.419299702346957e-05,
+      "loss": 1.946,
+      "step": 854
+    },
+    {
+      "epoch": 1.2854460093896714,
+      "grad_norm": 0.4008805453777313,
+      "learning_rate": 9.350213007675206e-05,
+      "loss": 2.0079,
+      "step": 856
+    },
+    {
+      "epoch": 1.2884507042253521,
+      "grad_norm": 0.39565256237983704,
+      "learning_rate": 9.281265696022533e-05,
+      "loss": 1.9417,
+      "step": 858
+    },
+    {
+      "epoch": 1.291455399061033,
+      "grad_norm": 0.41115814447402954,
+      "learning_rate": 9.212459468352966e-05,
+      "loss": 1.9208,
+      "step": 860
+    },
+    {
+      "epoch": 1.2944600938967137,
+      "grad_norm": 0.41735655069351196,
+      "learning_rate": 9.143796022149936e-05,
+      "loss": 1.9606,
+      "step": 862
+    },
+    {
+      "epoch": 1.2974647887323942,
+      "grad_norm": 0.41069456934928894,
+      "learning_rate": 9.075277051374364e-05,
+      "loss": 1.9674,
+      "step": 864
+    },
+    {
+      "epoch": 1.300469483568075,
+      "grad_norm": 0.38782206177711487,
+      "learning_rate": 9.006904246422904e-05,
+      "loss": 1.9045,
+      "step": 866
+    },
+    {
+      "epoch": 1.3034741784037558,
+      "grad_norm": 0.39365893602371216,
+      "learning_rate": 8.938679294086225e-05,
+      "loss": 1.9131,
+      "step": 868
+    },
+    {
+      "epoch": 1.3064788732394366,
+      "grad_norm": 0.41544634103775024,
+      "learning_rate": 8.870603877507399e-05,
+      "loss": 2.0275,
+      "step": 870
+    },
+    {
+      "epoch": 1.3094835680751173,
+      "grad_norm": 0.4032615125179291,
+      "learning_rate": 8.802679676140372e-05,
+      "loss": 2.0104,
+      "step": 872
+    },
+    {
+      "epoch": 1.3124882629107981,
+      "grad_norm": 0.4120880365371704,
+      "learning_rate": 8.734908365708548e-05,
+      "loss": 1.9859,
+      "step": 874
+    },
+    {
+      "epoch": 1.315492957746479,
+      "grad_norm": 0.41044726967811584,
+      "learning_rate": 8.667291618163432e-05,
+      "loss": 1.969,
+      "step": 876
+    },
+    {
+      "epoch": 1.3184976525821597,
+      "grad_norm": 0.41725054383277893,
+      "learning_rate": 8.599831101643377e-05,
+      "loss": 1.9068,
+      "step": 878
+    },
+    {
+      "epoch": 1.3215023474178405,
+      "grad_norm": 0.3953832685947418,
+      "learning_rate": 8.532528480432448e-05,
+      "loss": 1.9555,
+      "step": 880
+    },
+    {
+      "epoch": 1.3245070422535212,
+      "grad_norm": 0.40860870480537415,
+      "learning_rate": 8.465385414919363e-05,
+      "loss": 1.9989,
+      "step": 882
+    },
+    {
+      "epoch": 1.327511737089202,
+      "grad_norm": 0.4132196307182312,
+      "learning_rate": 8.398403561556506e-05,
+      "loss": 1.9419,
+      "step": 884
+    },
+    {
+      "epoch": 1.3305164319248826,
+      "grad_norm": 0.4141290783882141,
+      "learning_rate": 8.331584572819097e-05,
+      "loss": 2.0309,
+      "step": 886
+    },
+    {
+      "epoch": 1.3335211267605633,
+      "grad_norm": 0.40374478697776794,
+      "learning_rate": 8.26493009716439e-05,
+      "loss": 2.0116,
+      "step": 888
+    },
+    {
+      "epoch": 1.3365258215962441,
+      "grad_norm": 0.4018094539642334,
+      "learning_rate": 8.198441778991025e-05,
+      "loss": 2.0441,
+      "step": 890
+    },
+    {
+      "epoch": 1.339530516431925,
+      "grad_norm": 0.4007222354412079,
+      "learning_rate": 8.132121258598459e-05,
+      "loss": 1.8889,
+      "step": 892
+    },
+    {
+      "epoch": 1.3425352112676057,
+      "grad_norm": 0.39520931243896484,
+      "learning_rate": 8.065970172146483e-05,
+      "loss": 1.8656,
+      "step": 894
+    },
+    {
+      "epoch": 1.3455399061032864,
+      "grad_norm": 0.40495213866233826,
+      "learning_rate": 7.999990151614894e-05,
+      "loss": 2.0523,
+      "step": 896
+    },
+    {
+      "epoch": 1.348544600938967,
+      "grad_norm": 0.4034924805164337,
+      "learning_rate": 7.934182824763187e-05,
+      "loss": 1.9728,
+      "step": 898
+    },
+    {
+      "epoch": 1.3515492957746478,
+      "grad_norm": 0.4150352478027344,
+      "learning_rate": 7.868549815090424e-05,
+      "loss": 1.9782,
+      "step": 900
+    },
+    {
+      "epoch": 1.3515492957746478,
+      "eval_loss": 1.822741985321045,
+      "eval_runtime": 2.2313,
+      "eval_samples_per_second": 15.237,
+      "eval_steps_per_second": 1.344,
+      "step": 900
+    },
+    {
+      "epoch": 1.3545539906103286,
+      "grad_norm": 0.40226149559020996,
+      "learning_rate": 7.803092741795183e-05,
+      "loss": 2.0248,
+      "step": 902
+    },
+    {
+      "epoch": 1.3575586854460093,
+      "grad_norm": 0.3810630440711975,
+      "learning_rate": 7.737813219735598e-05,
+      "loss": 1.9544,
+      "step": 904
+    },
+    {
+      "epoch": 1.36056338028169,
+      "grad_norm": 0.3970591723918915,
+      "learning_rate": 7.672712859389523e-05,
+      "loss": 1.9887,
+      "step": 906
+    },
+    {
+      "epoch": 1.3635680751173709,
+      "grad_norm": 0.4038551151752472,
+      "learning_rate": 7.60779326681482e-05,
+      "loss": 2.0135,
+      "step": 908
+    },
+    {
+      "epoch": 1.3665727699530517,
+      "grad_norm": 0.4295276403427124,
+      "learning_rate": 7.543056043609716e-05,
+      "loss": 1.9749,
+      "step": 910
+    },
+    {
+      "epoch": 1.3695774647887324,
+      "grad_norm": 0.40757429599761963,
+      "learning_rate": 7.478502786873287e-05,
+      "loss": 2.0091,
+      "step": 912
+    },
+    {
+      "epoch": 1.3725821596244132,
+      "grad_norm": 0.4092368483543396,
+      "learning_rate": 7.414135089166073e-05,
+      "loss": 1.9814,
+      "step": 914
+    },
+    {
+      "epoch": 1.375586854460094,
+      "grad_norm": 0.3876953125,
+      "learning_rate": 7.34995453847078e-05,
+      "loss": 1.9932,
+      "step": 916
+    },
+    {
+      "epoch": 1.3785915492957748,
+      "grad_norm": 0.398101806640625,
+      "learning_rate": 7.285962718153098e-05,
+      "loss": 2.0123,
+      "step": 918
+    },
+    {
+      "epoch": 1.3815962441314553,
+      "grad_norm": 0.41104593873023987,
+      "learning_rate": 7.222161206922668e-05,
+      "loss": 1.9644,
+      "step": 920
+    },
+    {
+      "epoch": 1.384600938967136,
+      "grad_norm": 0.4044182002544403,
+      "learning_rate": 7.158551578794088e-05,
+      "loss": 1.9458,
+      "step": 922
+    },
+    {
+      "epoch": 1.3876056338028169,
+      "grad_norm": 0.4081133008003235,
+      "learning_rate": 7.095135403048119e-05,
+      "loss": 1.9607,
+      "step": 924
+    },
+    {
+      "epoch": 1.3906103286384977,
+      "grad_norm": 0.387970507144928,
+      "learning_rate": 7.031914244192952e-05,
+      "loss": 1.9329,
+      "step": 926
+    },
+    {
+      "epoch": 1.3936150234741784,
+      "grad_norm": 0.40578800439834595,
+      "learning_rate": 6.968889661925618e-05,
+      "loss": 2.0033,
+      "step": 928
+    },
+    {
+      "epoch": 1.3966197183098592,
+      "grad_norm": 0.41494008898735046,
+      "learning_rate": 6.906063211093497e-05,
+      "loss": 1.9251,
+      "step": 930
+    },
+    {
+      "epoch": 1.39962441314554,
+      "grad_norm": 0.41713011264801025,
+      "learning_rate": 6.843436441655988e-05,
+      "loss": 1.9405,
+      "step": 932
+    },
+    {
+      "epoch": 1.4026291079812205,
+      "grad_norm": 0.40219846367836,
+      "learning_rate": 6.781010898646242e-05,
+      "loss": 1.9069,
+      "step": 934
+    },
+    {
+      "epoch": 1.4056338028169013,
+      "grad_norm": 0.4286425709724426,
+      "learning_rate": 6.718788122133056e-05,
+      "loss": 1.9887,
+      "step": 936
+    },
+    {
+      "epoch": 1.408638497652582,
+      "grad_norm": 0.4062979817390442,
+      "learning_rate": 6.656769647182872e-05,
+      "loss": 1.8869,
+      "step": 938
+    },
+    {
+      "epoch": 1.4116431924882629,
+      "grad_norm": 0.3840523362159729,
+      "learning_rate": 6.594957003821923e-05,
+      "loss": 1.9905,
+      "step": 940
+    },
+    {
+      "epoch": 1.4146478873239436,
+      "grad_norm": 0.3980202078819275,
+      "learning_rate": 6.533351716998465e-05,
+      "loss": 1.9786,
+      "step": 942
+    },
+    {
+      "epoch": 1.4176525821596244,
+      "grad_norm": 0.39504823088645935,
+      "learning_rate": 6.471955306545167e-05,
+      "loss": 1.9653,
+      "step": 944
+    },
+    {
+      "epoch": 1.4206572769953052,
+      "grad_norm": 0.4146779775619507,
+      "learning_rate": 6.410769287141632e-05,
+      "loss": 1.9739,
+      "step": 946
+    },
+    {
+      "epoch": 1.423661971830986,
+      "grad_norm": 0.40134918689727783,
+      "learning_rate": 6.349795168276994e-05,
+      "loss": 1.9669,
+      "step": 948
+    },
+    {
+      "epoch": 1.4266666666666667,
+      "grad_norm": 0.391028493642807,
+      "learning_rate": 6.289034454212702e-05,
+      "loss": 1.9049,
+      "step": 950
+    },
+    {
+      "epoch": 1.4266666666666667,
+      "eval_loss": 1.8079575300216675,
+      "eval_runtime": 2.2361,
+      "eval_samples_per_second": 15.205,
+      "eval_steps_per_second": 1.342,
+      "step": 950
+    },
+    {
+      "epoch": 1.4296713615023475,
+      "grad_norm": 0.3979133367538452,
+      "learning_rate": 6.228488643945408e-05,
+      "loss": 1.8975,
+      "step": 952
+    },
+    {
+      "epoch": 1.4326760563380283,
+      "grad_norm": 0.4187586009502411,
+      "learning_rate": 6.168159231169976e-05,
+      "loss": 1.9258,
+      "step": 954
+    },
+    {
+      "epoch": 1.4356807511737089,
+      "grad_norm": 0.42900320887565613,
+      "learning_rate": 6.108047704242634e-05,
+      "loss": 1.9716,
+      "step": 956
+    },
+    {
+      "epoch": 1.4386854460093896,
+      "grad_norm": 0.39041733741760254,
+      "learning_rate": 6.0481555461442723e-05,
+      "loss": 2.002,
+      "step": 958
+    },
+    {
+      "epoch": 1.4416901408450704,
+      "grad_norm": 0.3880235254764557,
+      "learning_rate": 5.988484234443842e-05,
+      "loss": 1.9451,
+      "step": 960
+    },
+    {
+      "epoch": 1.4446948356807512,
+      "grad_norm": 0.3954552412033081,
+      "learning_rate": 5.929035241261898e-05,
+      "loss": 1.9779,
+      "step": 962
+    },
+    {
+      "epoch": 1.447699530516432,
+      "grad_norm": 0.40889522433280945,
+      "learning_rate": 5.869810033234288e-05,
+      "loss": 1.9409,
+      "step": 964
+    },
+    {
+      "epoch": 1.4507042253521127,
+      "grad_norm": 0.40016794204711914,
+      "learning_rate": 5.810810071475973e-05,
+      "loss": 1.9904,
+      "step": 966
+    },
+    {
+      "epoch": 1.4537089201877933,
+      "grad_norm": 0.39795544743537903,
+      "learning_rate": 5.752036811544973e-05,
+      "loss": 2.0043,
+      "step": 968
+    },
+    {
+      "epoch": 1.456713615023474,
+      "grad_norm": 0.4015527367591858,
+      "learning_rate": 5.693491703406478e-05,
+      "loss": 1.9646,
+      "step": 970
+    },
+    {
+      "epoch": 1.4597183098591549,
+      "grad_norm": 0.39066001772880554,
+      "learning_rate": 5.635176191397047e-05,
+      "loss": 2.017,
+      "step": 972
+    },
+    {
+      "epoch": 1.4627230046948356,
+      "grad_norm": 0.3964717984199524,
+      "learning_rate": 5.5770917141889916e-05,
+      "loss": 1.911,
+      "step": 974
+    },
+    {
+      "epoch": 1.4657276995305164,
+      "grad_norm": 0.3941939175128937,
+      "learning_rate": 5.519239704754885e-05,
+      "loss": 2.0063,
+      "step": 976
+    },
+    {
+      "epoch": 1.4687323943661972,
+      "grad_norm": 0.4054701626300812,
+      "learning_rate": 5.461621590332202e-05,
+      "loss": 2.0763,
+      "step": 978
+    },
+    {
+      "epoch": 1.471737089201878,
+      "grad_norm": 0.407295286655426,
+      "learning_rate": 5.4042387923881117e-05,
+      "loss": 2.0058,
+      "step": 980
+    },
+    {
+      "epoch": 1.4747417840375587,
+      "grad_norm": 0.3966968059539795,
+      "learning_rate": 5.3470927265844195e-05,
+      "loss": 1.888,
+      "step": 982
+    },
+    {
+      "epoch": 1.4777464788732395,
+      "grad_norm": 0.40702012181282043,
+      "learning_rate": 5.290184802742632e-05,
+      "loss": 1.9624,
+      "step": 984
+    },
+    {
+      "epoch": 1.4807511737089203,
+      "grad_norm": 0.4187633991241455,
+      "learning_rate": 5.2335164248091635e-05,
+      "loss": 1.9762,
+      "step": 986
+    },
+    {
+      "epoch": 1.483755868544601,
+      "grad_norm": 0.4065389335155487,
+      "learning_rate": 5.1770889908207245e-05,
+      "loss": 1.918,
+      "step": 988
+    },
+    {
+      "epoch": 1.4867605633802816,
+      "grad_norm": 0.40223878622055054,
+      "learning_rate": 5.1209038928698146e-05,
+      "loss": 1.9291,
+      "step": 990
+    },
+    {
+      "epoch": 1.4897652582159624,
+      "grad_norm": 0.3900267481803894,
+      "learning_rate": 5.064962517070388e-05,
+      "loss": 2.0104,
+      "step": 992
+    },
+    {
+      "epoch": 1.4927699530516432,
+      "grad_norm": 0.40887948870658875,
+      "learning_rate": 5.0092662435236454e-05,
+      "loss": 1.9735,
+      "step": 994
+    },
+    {
+      "epoch": 1.495774647887324,
+      "grad_norm": 0.39081013202667236,
+      "learning_rate": 4.9538164462840135e-05,
+      "loss": 1.9952,
+      "step": 996
+    },
+    {
+      "epoch": 1.4987793427230047,
+      "grad_norm": 0.40461257100105286,
+      "learning_rate": 4.898614493325209e-05,
+      "loss": 1.9977,
+      "step": 998
+    },
+    {
+      "epoch": 1.5017840375586853,
+      "grad_norm": 0.40150025486946106,
+      "learning_rate": 4.843661746506516e-05,
+      "loss": 1.9494,
+      "step": 1000
+    },
+    {
+      "epoch": 1.5017840375586853,
+      "eval_loss": 1.804593801498413,
+      "eval_runtime": 2.2337,
+      "eval_samples_per_second": 15.222,
+      "eval_steps_per_second": 1.343,
+      "step": 1000
+    }
+  ],
+  "logging_steps": 2,
+  "max_steps": 1332,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 2,
+  "save_steps": 50,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 1.6520962782795923e+19,
+  "train_batch_size": 2,
+  "trial_name": null,
+  "trial_params": null
+}

training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0aaceb523795fb69c67260904e208c7436cb5c9bd101a53dc5d0e085073ba911
+size 6097

vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff