Training in progress, step 365, checkpoint

Browse files

Files changed (14) hide show

last-checkpoint/README.md +202 -0
last-checkpoint/adapter_config.json +30 -0
last-checkpoint/adapter_model.safetensors +3 -0
last-checkpoint/added_tokens.json +4 -0
last-checkpoint/merges.txt +0 -0
last-checkpoint/optimizer.pt +3 -0
last-checkpoint/rng_state.pth +3 -0
last-checkpoint/scheduler.pt +3 -0
last-checkpoint/special_tokens_map.json +30 -0
last-checkpoint/tokenizer.json +0 -0
last-checkpoint/tokenizer_config.json +205 -0
last-checkpoint/trainer_state.json +2604 -0
last-checkpoint/training_args.bin +3 -0
last-checkpoint/vocab.json +0 -0

last-checkpoint/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: katuni4ka/tiny-random-dbrx
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.13.2

last-checkpoint/adapter_config.json ADDED Viewed

	@@ -0,0 +1,30 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "katuni4ka/tiny-random-dbrx",
+  "bias": "none",
+  "fan_in_fan_out": null,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "layer",
+    "out_proj",
+    "Wqkv"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

last-checkpoint/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:eb782cbdf99c1a1e902a330bb365a1298c4d159ee62de373e5e828ff27ab9634
+size 5752

last-checkpoint/added_tokens.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+  "<|im_end|>": 100279,
+  "<|im_start|>": 100278
+}

last-checkpoint/merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

last-checkpoint/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f435ce850777e13d0d23fc42025258ccda567de6f2c0b4fe7d7997231c816891
+size 15814

last-checkpoint/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9a5df6564cf48a50ae92cee490ec0d4eed87ac68b09129926b4f67ec9bc9b051
+size 14244

last-checkpoint/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:635160eaaf28a686b3665877a41b33c5d5f3ffb6419262657260d434b3f7650e
+size 1064

last-checkpoint/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,30 @@

+{
+  "bos_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<|pad|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

last-checkpoint/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

last-checkpoint/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,205 @@

+{
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "100256": {
+      "content": "<||_unused_0_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100257": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100258": {
+      "content": "<|fim_prefix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100259": {
+      "content": "<|fim_middle|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100260": {
+      "content": "<|fim_suffix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100261": {
+      "content": "<||_unused_1_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100262": {
+      "content": "<||_unused_2_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100263": {
+      "content": "<||_unused_3_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100264": {
+      "content": "<||_unused_4_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100265": {
+      "content": "<||_unused_5_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100266": {
+      "content": "<||_unused_6_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100267": {
+      "content": "<||_unused_7_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100268": {
+      "content": "<||_unused_8_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100269": {
+      "content": "<||_unused_9_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100270": {
+      "content": "<||_unused_10_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100271": {
+      "content": "<||_unused_11_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100272": {
+      "content": "<||_unused_12_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100273": {
+      "content": "<||_unused_13_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100274": {
+      "content": "<||_unused_14_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100275": {
+      "content": "<||_unused_15_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100276": {
+      "content": "<|endofprompt|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100277": {
+      "content": "<|pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100278": {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100279": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<|endoftext|>",
+  "chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}{% endif %}",
+  "clean_up_tokenization_spaces": true,
+  "eos_token": "<|endoftext|>",
+  "model_max_length": 32768,
+  "pad_token": "<|pad|>",
+  "tokenizer_class": "GPT2Tokenizer",
+  "unk_token": "<|endoftext|>"
+}

last-checkpoint/trainer_state.json ADDED Viewed

	@@ -0,0 +1,2604 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 0.061595578618740245,
+  "eval_steps": 365,
+  "global_step": 365,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.00016875500991435683,
+      "grad_norm": 1.940874608408194e-05,
+      "learning_rate": 2e-05,
+      "loss": 46.0,
+      "step": 1
+    },
+    {
+      "epoch": 0.00016875500991435683,
+      "eval_loss": 11.5,
+      "eval_runtime": 14.7961,
+      "eval_samples_per_second": 168.625,
+      "eval_steps_per_second": 84.346,
+      "step": 1
+    },
+    {
+      "epoch": 0.00033751001982871366,
+      "grad_norm": 1.7339452824671753e-05,
+      "learning_rate": 4e-05,
+      "loss": 46.0,
+      "step": 2
+    },
+    {
+      "epoch": 0.0005062650297430705,
+      "grad_norm": 9.871354450297076e-06,
+      "learning_rate": 6e-05,
+      "loss": 46.0,
+      "step": 3
+    },
+    {
+      "epoch": 0.0006750200396574273,
+      "grad_norm": 1.9611639800132252e-05,
+      "learning_rate": 8e-05,
+      "loss": 46.0,
+      "step": 4
+    },
+    {
+      "epoch": 0.0008437750495717841,
+      "grad_norm": 1.9497307221172377e-05,
+      "learning_rate": 0.0001,
+      "loss": 46.0,
+      "step": 5
+    },
+    {
+      "epoch": 0.001012530059486141,
+      "grad_norm": 1.4163069863570854e-05,
+      "learning_rate": 0.00012,
+      "loss": 46.0,
+      "step": 6
+    },
+    {
+      "epoch": 0.0011812850694004977,
+      "grad_norm": 2.7470567147247493e-05,
+      "learning_rate": 0.00014,
+      "loss": 46.0,
+      "step": 7
+    },
+    {
+      "epoch": 0.0013500400793148546,
+      "grad_norm": 1.262454861716833e-05,
+      "learning_rate": 0.00016,
+      "loss": 46.0,
+      "step": 8
+    },
+    {
+      "epoch": 0.0015187950892292116,
+      "grad_norm": 1.2461353435355704e-05,
+      "learning_rate": 0.00018,
+      "loss": 46.0,
+      "step": 9
+    },
+    {
+      "epoch": 0.0016875500991435683,
+      "grad_norm": 1.924686148413457e-05,
+      "learning_rate": 0.0002,
+      "loss": 46.0,
+      "step": 10
+    },
+    {
+      "epoch": 0.0018563051090579252,
+      "grad_norm": 1.5145715224207379e-05,
+      "learning_rate": 0.0001999997643146886,
+      "loss": 46.0,
+      "step": 11
+    },
+    {
+      "epoch": 0.002025060118972282,
+      "grad_norm": 8.845885531627573e-06,
+      "learning_rate": 0.0001999990572598653,
+      "loss": 46.0,
+      "step": 12
+    },
+    {
+      "epoch": 0.0021938151288866388,
+      "grad_norm": 1.8190678019891493e-05,
+      "learning_rate": 0.00019999787883886297,
+      "loss": 46.0,
+      "step": 13
+    },
+    {
+      "epoch": 0.0023625701388009955,
+      "grad_norm": 1.827279083954636e-05,
+      "learning_rate": 0.00019999622905723634,
+      "loss": 46.0,
+      "step": 14
+    },
+    {
+      "epoch": 0.0025313251487153526,
+      "grad_norm": 1.6565853002248332e-05,
+      "learning_rate": 0.00019999410792276198,
+      "loss": 46.0,
+      "step": 15
+    },
+    {
+      "epoch": 0.0027000801586297093,
+      "grad_norm": 1.0454502444190439e-05,
+      "learning_rate": 0.00019999151544543832,
+      "loss": 46.0,
+      "step": 16
+    },
+    {
+      "epoch": 0.002868835168544066,
+      "grad_norm": 1.3837036021868698e-05,
+      "learning_rate": 0.00019998845163748553,
+      "loss": 46.0,
+      "step": 17
+    },
+    {
+      "epoch": 0.003037590178458423,
+      "grad_norm": 1.835626426327508e-05,
+      "learning_rate": 0.0001999849165133455,
+      "loss": 46.0,
+      "step": 18
+    },
+    {
+      "epoch": 0.00320634518837278,
+      "grad_norm": 1.9169588995282538e-05,
+      "learning_rate": 0.00019998091008968175,
+      "loss": 46.0,
+      "step": 19
+    },
+    {
+      "epoch": 0.0033751001982871365,
+      "grad_norm": 9.931905879057012e-06,
+      "learning_rate": 0.0001999764323853794,
+      "loss": 46.0,
+      "step": 20
+    },
+    {
+      "epoch": 0.0035438552082014936,
+      "grad_norm": 1.246378269570414e-05,
+      "learning_rate": 0.00019997148342154502,
+      "loss": 46.0,
+      "step": 21
+    },
+    {
+      "epoch": 0.0037126102181158503,
+      "grad_norm": 1.627793062652927e-05,
+      "learning_rate": 0.0001999660632215066,
+      "loss": 46.0,
+      "step": 22
+    },
+    {
+      "epoch": 0.003881365228030207,
+      "grad_norm": 1.6465271983179264e-05,
+      "learning_rate": 0.00019996017181081336,
+      "loss": 46.0,
+      "step": 23
+    },
+    {
+      "epoch": 0.004050120237944564,
+      "grad_norm": 1.4562248907168396e-05,
+      "learning_rate": 0.00019995380921723562,
+      "loss": 46.0,
+      "step": 24
+    },
+    {
+      "epoch": 0.00421887524785892,
+      "grad_norm": 1.0674185432435479e-05,
+      "learning_rate": 0.00019994697547076487,
+      "loss": 46.0,
+      "step": 25
+    },
+    {
+      "epoch": 0.0043876302577732776,
+      "grad_norm": 1.2923982467327733e-05,
+      "learning_rate": 0.00019993967060361335,
+      "loss": 46.0,
+      "step": 26
+    },
+    {
+      "epoch": 0.004556385267687635,
+      "grad_norm": 1.9432607587077655e-05,
+      "learning_rate": 0.00019993189465021405,
+      "loss": 46.0,
+      "step": 27
+    },
+    {
+      "epoch": 0.004725140277601991,
+      "grad_norm": 1.4471233043877874e-05,
+      "learning_rate": 0.0001999236476472205,
+      "loss": 46.0,
+      "step": 28
+    },
+    {
+      "epoch": 0.004893895287516348,
+      "grad_norm": 2.0513718482106924e-05,
+      "learning_rate": 0.0001999149296335067,
+      "loss": 46.0,
+      "step": 29
+    },
+    {
+      "epoch": 0.005062650297430705,
+      "grad_norm": 2.3202137526823208e-05,
+      "learning_rate": 0.00019990574065016677,
+      "loss": 46.0,
+      "step": 30
+    },
+    {
+      "epoch": 0.0052314053073450615,
+      "grad_norm": 2.1836229279870167e-05,
+      "learning_rate": 0.00019989608074051489,
+      "loss": 46.0,
+      "step": 31
+    },
+    {
+      "epoch": 0.005400160317259419,
+      "grad_norm": 1.5274166798917577e-05,
+      "learning_rate": 0.00019988594995008505,
+      "loss": 46.0,
+      "step": 32
+    },
+    {
+      "epoch": 0.005568915327173776,
+      "grad_norm": 1.919052010634914e-05,
+      "learning_rate": 0.00019987534832663082,
+      "loss": 46.0,
+      "step": 33
+    },
+    {
+      "epoch": 0.005737670337088132,
+      "grad_norm": 1.5861112842685543e-05,
+      "learning_rate": 0.0001998642759201251,
+      "loss": 46.0,
+      "step": 34
+    },
+    {
+      "epoch": 0.005906425347002489,
+      "grad_norm": 1.185925975732971e-05,
+      "learning_rate": 0.00019985273278276,
+      "loss": 46.0,
+      "step": 35
+    },
+    {
+      "epoch": 0.006075180356916846,
+      "grad_norm": 1.6194346244446933e-05,
+      "learning_rate": 0.00019984071896894646,
+      "loss": 46.0,
+      "step": 36
+    },
+    {
+      "epoch": 0.0062439353668312025,
+      "grad_norm": 1.961121779459063e-05,
+      "learning_rate": 0.0001998282345353141,
+      "loss": 46.0,
+      "step": 37
+    },
+    {
+      "epoch": 0.00641269037674556,
+      "grad_norm": 2.1559202650678344e-05,
+      "learning_rate": 0.0001998152795407108,
+      "loss": 46.0,
+      "step": 38
+    },
+    {
+      "epoch": 0.006581445386659917,
+      "grad_norm": 2.8157497581560165e-05,
+      "learning_rate": 0.00019980185404620268,
+      "loss": 46.0,
+      "step": 39
+    },
+    {
+      "epoch": 0.006750200396574273,
+      "grad_norm": 2.1486024706973694e-05,
+      "learning_rate": 0.00019978795811507354,
+      "loss": 46.0,
+      "step": 40
+    },
+    {
+      "epoch": 0.00691895540648863,
+      "grad_norm": 2.3358212274615653e-05,
+      "learning_rate": 0.00019977359181282473,
+      "loss": 46.0,
+      "step": 41
+    },
+    {
+      "epoch": 0.007087710416402987,
+      "grad_norm": 1.6756239347159863e-05,
+      "learning_rate": 0.00019975875520717479,
+      "loss": 46.0,
+      "step": 42
+    },
+    {
+      "epoch": 0.0072564654263173435,
+      "grad_norm": 1.7175016182591207e-05,
+      "learning_rate": 0.00019974344836805905,
+      "loss": 46.0,
+      "step": 43
+    },
+    {
+      "epoch": 0.007425220436231701,
+      "grad_norm": 1.2200899618619587e-05,
+      "learning_rate": 0.00019972767136762953,
+      "loss": 46.0,
+      "step": 44
+    },
+    {
+      "epoch": 0.007593975446146058,
+      "grad_norm": 1.7348913388559595e-05,
+      "learning_rate": 0.00019971142428025433,
+      "loss": 46.0,
+      "step": 45
+    },
+    {
+      "epoch": 0.007762730456060414,
+      "grad_norm": 9.565149412082974e-06,
+      "learning_rate": 0.00019969470718251748,
+      "loss": 46.0,
+      "step": 46
+    },
+    {
+      "epoch": 0.007931485465974771,
+      "grad_norm": 1.5444033124367706e-05,
+      "learning_rate": 0.00019967752015321845,
+      "loss": 46.0,
+      "step": 47
+    },
+    {
+      "epoch": 0.008100240475889128,
+      "grad_norm": 1.9607326976256445e-05,
+      "learning_rate": 0.00019965986327337185,
+      "loss": 46.0,
+      "step": 48
+    },
+    {
+      "epoch": 0.008268995485803485,
+      "grad_norm": 1.6280893760267645e-05,
+      "learning_rate": 0.00019964173662620702,
+      "loss": 46.0,
+      "step": 49
+    },
+    {
+      "epoch": 0.00843775049571784,
+      "grad_norm": 2.2253505449043587e-05,
+      "learning_rate": 0.00019962314029716766,
+      "loss": 46.0,
+      "step": 50
+    },
+    {
+      "epoch": 0.008606505505632198,
+      "grad_norm": 1.3872278032067697e-05,
+      "learning_rate": 0.0001996040743739114,
+      "loss": 46.0,
+      "step": 51
+    },
+    {
+      "epoch": 0.008775260515546555,
+      "grad_norm": 3.644825119408779e-05,
+      "learning_rate": 0.0001995845389463094,
+      "loss": 46.0,
+      "step": 52
+    },
+    {
+      "epoch": 0.008944015525460912,
+      "grad_norm": 1.858348332461901e-05,
+      "learning_rate": 0.00019956453410644592,
+      "loss": 46.0,
+      "step": 53
+    },
+    {
+      "epoch": 0.00911277053537527,
+      "grad_norm": 1.7900563761941157e-05,
+      "learning_rate": 0.0001995440599486179,
+      "loss": 46.0,
+      "step": 54
+    },
+    {
+      "epoch": 0.009281525545289626,
+      "grad_norm": 4.377702498459257e-05,
+      "learning_rate": 0.0001995231165693345,
+      "loss": 46.0,
+      "step": 55
+    },
+    {
+      "epoch": 0.009450280555203982,
+      "grad_norm": 2.3857590349507518e-05,
+      "learning_rate": 0.00019950170406731667,
+      "loss": 46.0,
+      "step": 56
+    },
+    {
+      "epoch": 0.009619035565118339,
+      "grad_norm": 1.2674429854087066e-05,
+      "learning_rate": 0.00019947982254349666,
+      "loss": 46.0,
+      "step": 57
+    },
+    {
+      "epoch": 0.009787790575032696,
+      "grad_norm": 2.890539326472208e-05,
+      "learning_rate": 0.00019945747210101754,
+      "loss": 46.0,
+      "step": 58
+    },
+    {
+      "epoch": 0.009956545584947053,
+      "grad_norm": 1.9787186829489656e-05,
+      "learning_rate": 0.0001994346528452327,
+      "loss": 46.0,
+      "step": 59
+    },
+    {
+      "epoch": 0.01012530059486141,
+      "grad_norm": 2.427671461191494e-05,
+      "learning_rate": 0.00019941136488370542,
+      "loss": 46.0,
+      "step": 60
+    },
+    {
+      "epoch": 0.010294055604775768,
+      "grad_norm": 1.2431184586603194e-05,
+      "learning_rate": 0.00019938760832620834,
+      "loss": 46.0,
+      "step": 61
+    },
+    {
+      "epoch": 0.010462810614690123,
+      "grad_norm": 2.0812987713725306e-05,
+      "learning_rate": 0.00019936338328472287,
+      "loss": 46.0,
+      "step": 62
+    },
+    {
+      "epoch": 0.01063156562460448,
+      "grad_norm": 2.8024591301800683e-05,
+      "learning_rate": 0.00019933868987343875,
+      "loss": 46.0,
+      "step": 63
+    },
+    {
+      "epoch": 0.010800320634518837,
+      "grad_norm": 2.2569249267689884e-05,
+      "learning_rate": 0.0001993135282087535,
+      "loss": 46.0,
+      "step": 64
+    },
+    {
+      "epoch": 0.010969075644433194,
+      "grad_norm": 1.9383338440093212e-05,
+      "learning_rate": 0.0001992878984092717,
+      "loss": 46.0,
+      "step": 65
+    },
+    {
+      "epoch": 0.011137830654347551,
+      "grad_norm": 2.542112815717701e-05,
+      "learning_rate": 0.00019926180059580482,
+      "loss": 46.0,
+      "step": 66
+    },
+    {
+      "epoch": 0.011306585664261909,
+      "grad_norm": 1.598012568138074e-05,
+      "learning_rate": 0.00019923523489137024,
+      "loss": 46.0,
+      "step": 67
+    },
+    {
+      "epoch": 0.011475340674176264,
+      "grad_norm": 1.4789104170631617e-05,
+      "learning_rate": 0.00019920820142119085,
+      "loss": 46.0,
+      "step": 68
+    },
+    {
+      "epoch": 0.011644095684090621,
+      "grad_norm": 1.9799528672592714e-05,
+      "learning_rate": 0.00019918070031269453,
+      "loss": 46.0,
+      "step": 69
+    },
+    {
+      "epoch": 0.011812850694004978,
+      "grad_norm": 3.249241126468405e-05,
+      "learning_rate": 0.00019915273169551342,
+      "loss": 46.0,
+      "step": 70
+    },
+    {
+      "epoch": 0.011981605703919335,
+      "grad_norm": 2.287813367729541e-05,
+      "learning_rate": 0.00019912429570148339,
+      "loss": 46.0,
+      "step": 71
+    },
+    {
+      "epoch": 0.012150360713833692,
+      "grad_norm": 3.186216781614348e-05,
+      "learning_rate": 0.0001990953924646433,
+      "loss": 46.0,
+      "step": 72
+    },
+    {
+      "epoch": 0.01231911572374805,
+      "grad_norm": 4.087354682269506e-05,
+      "learning_rate": 0.00019906602212123455,
+      "loss": 46.0,
+      "step": 73
+    },
+    {
+      "epoch": 0.012487870733662405,
+      "grad_norm": 2.5896191800711676e-05,
+      "learning_rate": 0.00019903618480970035,
+      "loss": 46.0,
+      "step": 74
+    },
+    {
+      "epoch": 0.012656625743576762,
+      "grad_norm": 2.6261466700816527e-05,
+      "learning_rate": 0.00019900588067068493,
+      "loss": 46.0,
+      "step": 75
+    },
+    {
+      "epoch": 0.01282538075349112,
+      "grad_norm": 1.815898940549232e-05,
+      "learning_rate": 0.0001989751098470332,
+      "loss": 46.0,
+      "step": 76
+    },
+    {
+      "epoch": 0.012994135763405476,
+      "grad_norm": 1.682456240814645e-05,
+      "learning_rate": 0.0001989438724837897,
+      "loss": 46.0,
+      "step": 77
+    },
+    {
+      "epoch": 0.013162890773319834,
+      "grad_norm": 2.581811168056447e-05,
+      "learning_rate": 0.00019891216872819825,
+      "loss": 46.0,
+      "step": 78
+    },
+    {
+      "epoch": 0.013331645783234189,
+      "grad_norm": 3.8674785173498094e-05,
+      "learning_rate": 0.00019887999872970097,
+      "loss": 46.0,
+      "step": 79
+    },
+    {
+      "epoch": 0.013500400793148546,
+      "grad_norm": 3.877016933984123e-05,
+      "learning_rate": 0.00019884736263993784,
+      "loss": 46.0,
+      "step": 80
+    },
+    {
+      "epoch": 0.013669155803062903,
+      "grad_norm": 1.5791521946084686e-05,
+      "learning_rate": 0.0001988142606127458,
+      "loss": 46.0,
+      "step": 81
+    },
+    {
+      "epoch": 0.01383791081297726,
+      "grad_norm": 2.4946857593022287e-05,
+      "learning_rate": 0.00019878069280415803,
+      "loss": 46.0,
+      "step": 82
+    },
+    {
+      "epoch": 0.014006665822891617,
+      "grad_norm": 4.878333129454404e-05,
+      "learning_rate": 0.00019874665937240335,
+      "loss": 46.0,
+      "step": 83
+    },
+    {
+      "epoch": 0.014175420832805975,
+      "grad_norm": 2.5513558284728788e-05,
+      "learning_rate": 0.00019871216047790538,
+      "loss": 46.0,
+      "step": 84
+    },
+    {
+      "epoch": 0.01434417584272033,
+      "grad_norm": 5.085681550554e-05,
+      "learning_rate": 0.00019867719628328175,
+      "loss": 46.0,
+      "step": 85
+    },
+    {
+      "epoch": 0.014512930852634687,
+      "grad_norm": 4.014965088572353e-05,
+      "learning_rate": 0.0001986417669533434,
+      "loss": 46.0,
+      "step": 86
+    },
+    {
+      "epoch": 0.014681685862549044,
+      "grad_norm": 2.0565448721754365e-05,
+      "learning_rate": 0.0001986058726550938,
+      "loss": 46.0,
+      "step": 87
+    },
+    {
+      "epoch": 0.014850440872463401,
+      "grad_norm": 4.422978236107156e-05,
+      "learning_rate": 0.00019856951355772814,
+      "loss": 46.0,
+      "step": 88
+    },
+    {
+      "epoch": 0.015019195882377758,
+      "grad_norm": 2.699590550037101e-05,
+      "learning_rate": 0.00019853268983263244,
+      "loss": 46.0,
+      "step": 89
+    },
+    {
+      "epoch": 0.015187950892292116,
+      "grad_norm": 1.274027908948483e-05,
+      "learning_rate": 0.000198495401653383,
+      "loss": 46.0,
+      "step": 90
+    },
+    {
+      "epoch": 0.015356705902206471,
+      "grad_norm": 4.096307384315878e-05,
+      "learning_rate": 0.00019845764919574537,
+      "loss": 46.0,
+      "step": 91
+    },
+    {
+      "epoch": 0.015525460912120828,
+      "grad_norm": 3.1611998565495014e-05,
+      "learning_rate": 0.00019841943263767346,
+      "loss": 46.0,
+      "step": 92
+    },
+    {
+      "epoch": 0.015694215922035185,
+      "grad_norm": 2.0112831407459453e-05,
+      "learning_rate": 0.00019838075215930894,
+      "loss": 46.0,
+      "step": 93
+    },
+    {
+      "epoch": 0.015862970931949542,
+      "grad_norm": 6.471107190009207e-05,
+      "learning_rate": 0.00019834160794298024,
+      "loss": 46.0,
+      "step": 94
+    },
+    {
+      "epoch": 0.0160317259418639,
+      "grad_norm": 3.331500920467079e-05,
+      "learning_rate": 0.00019830200017320168,
+      "loss": 46.0,
+      "step": 95
+    },
+    {
+      "epoch": 0.016200480951778257,
+      "grad_norm": 3.4283220884390175e-05,
+      "learning_rate": 0.0001982619290366726,
+      "loss": 46.0,
+      "step": 96
+    },
+    {
+      "epoch": 0.016369235961692614,
+      "grad_norm": 4.784906195709482e-05,
+      "learning_rate": 0.00019822139472227665,
+      "loss": 46.0,
+      "step": 97
+    },
+    {
+      "epoch": 0.01653799097160697,
+      "grad_norm": 4.9155318265547976e-05,
+      "learning_rate": 0.00019818039742108064,
+      "loss": 46.0,
+      "step": 98
+    },
+    {
+      "epoch": 0.016706745981521328,
+      "grad_norm": 4.837827509618364e-05,
+      "learning_rate": 0.00019813893732633378,
+      "loss": 46.0,
+      "step": 99
+    },
+    {
+      "epoch": 0.01687550099143568,
+      "grad_norm": 4.3958363676210865e-05,
+      "learning_rate": 0.00019809701463346683,
+      "loss": 46.0,
+      "step": 100
+    },
+    {
+      "epoch": 0.01704425600135004,
+      "grad_norm": 5.13470804435201e-05,
+      "learning_rate": 0.000198054629540091,
+      "loss": 46.0,
+      "step": 101
+    },
+    {
+      "epoch": 0.017213011011264396,
+      "grad_norm": 3.207432382623665e-05,
+      "learning_rate": 0.00019801178224599722,
+      "loss": 46.0,
+      "step": 102
+    },
+    {
+      "epoch": 0.017381766021178753,
+      "grad_norm": 3.24075881508179e-05,
+      "learning_rate": 0.00019796847295315502,
+      "loss": 46.0,
+      "step": 103
+    },
+    {
+      "epoch": 0.01755052103109311,
+      "grad_norm": 5.3687395848101005e-05,
+      "learning_rate": 0.00019792470186571167,
+      "loss": 46.0,
+      "step": 104
+    },
+    {
+      "epoch": 0.017719276041007467,
+      "grad_norm": 2.9774340873700567e-05,
+      "learning_rate": 0.00019788046918999122,
+      "loss": 46.0,
+      "step": 105
+    },
+    {
+      "epoch": 0.017888031050921824,
+      "grad_norm": 7.565080159110948e-05,
+      "learning_rate": 0.00019783577513449353,
+      "loss": 46.0,
+      "step": 106
+    },
+    {
+      "epoch": 0.01805678606083618,
+      "grad_norm": 5.2741965191671625e-05,
+      "learning_rate": 0.0001977906199098932,
+      "loss": 46.0,
+      "step": 107
+    },
+    {
+      "epoch": 0.01822554107075054,
+      "grad_norm": 3.216122786398046e-05,
+      "learning_rate": 0.0001977450037290388,
+      "loss": 46.0,
+      "step": 108
+    },
+    {
+      "epoch": 0.018394296080664896,
+      "grad_norm": 5.457305451272987e-05,
+      "learning_rate": 0.00019769892680695147,
+      "loss": 46.0,
+      "step": 109
+    },
+    {
+      "epoch": 0.018563051090579253,
+      "grad_norm": 3.489471419015899e-05,
+      "learning_rate": 0.00019765238936082438,
+      "loss": 46.0,
+      "step": 110
+    },
+    {
+      "epoch": 0.01873180610049361,
+      "grad_norm": 1.886937752715312e-05,
+      "learning_rate": 0.00019760539161002135,
+      "loss": 46.0,
+      "step": 111
+    },
+    {
+      "epoch": 0.018900561110407964,
+      "grad_norm": 3.381875285413116e-05,
+      "learning_rate": 0.00019755793377607597,
+      "loss": 46.0,
+      "step": 112
+    },
+    {
+      "epoch": 0.01906931612032232,
+      "grad_norm": 3.3628173696342856e-05,
+      "learning_rate": 0.00019751001608269052,
+      "loss": 46.0,
+      "step": 113
+    },
+    {
+      "epoch": 0.019238071130236678,
+      "grad_norm": 5.391196464188397e-05,
+      "learning_rate": 0.00019746163875573492,
+      "loss": 46.0,
+      "step": 114
+    },
+    {
+      "epoch": 0.019406826140151035,
+      "grad_norm": 4.308508141548373e-05,
+      "learning_rate": 0.0001974128020232457,
+      "loss": 46.0,
+      "step": 115
+    },
+    {
+      "epoch": 0.019575581150065392,
+      "grad_norm": 0.00010606838623061776,
+      "learning_rate": 0.00019736350611542487,
+      "loss": 46.0,
+      "step": 116
+    },
+    {
+      "epoch": 0.01974433615997975,
+      "grad_norm": 4.744268153444864e-05,
+      "learning_rate": 0.00019731375126463886,
+      "loss": 46.0,
+      "step": 117
+    },
+    {
+      "epoch": 0.019913091169894107,
+      "grad_norm": 3.2672236557118595e-05,
+      "learning_rate": 0.00019726353770541742,
+      "loss": 46.0,
+      "step": 118
+    },
+    {
+      "epoch": 0.020081846179808464,
+      "grad_norm": 1.8585633370094e-05,
+      "learning_rate": 0.0001972128656744525,
+      "loss": 46.0,
+      "step": 119
+    },
+    {
+      "epoch": 0.02025060118972282,
+      "grad_norm": 3.3655844163149595e-05,
+      "learning_rate": 0.0001971617354105972,
+      "loss": 46.0,
+      "step": 120
+    },
+    {
+      "epoch": 0.020419356199637178,
+      "grad_norm": 4.9799295084085315e-05,
+      "learning_rate": 0.00019711014715486448,
+      "loss": 46.0,
+      "step": 121
+    },
+    {
+      "epoch": 0.020588111209551535,
+      "grad_norm": 7.914419256849214e-05,
+      "learning_rate": 0.00019705810115042634,
+      "loss": 46.0,
+      "step": 122
+    },
+    {
+      "epoch": 0.02075686621946589,
+      "grad_norm": 4.802920011570677e-05,
+      "learning_rate": 0.00019700559764261225,
+      "loss": 46.0,
+      "step": 123
+    },
+    {
+      "epoch": 0.020925621229380246,
+      "grad_norm": 3.76962598238606e-05,
+      "learning_rate": 0.0001969526368789084,
+      "loss": 46.0,
+      "step": 124
+    },
+    {
+      "epoch": 0.021094376239294603,
+      "grad_norm": 3.57206336047966e-05,
+      "learning_rate": 0.00019689921910895627,
+      "loss": 46.0,
+      "step": 125
+    },
+    {
+      "epoch": 0.02126313124920896,
+      "grad_norm": 0.0001358168519800529,
+      "learning_rate": 0.00019684534458455145,
+      "loss": 46.0,
+      "step": 126
+    },
+    {
+      "epoch": 0.021431886259123317,
+      "grad_norm": 3.319705865578726e-05,
+      "learning_rate": 0.0001967910135596427,
+      "loss": 46.0,
+      "step": 127
+    },
+    {
+      "epoch": 0.021600641269037674,
+      "grad_norm": 9.154703002423048e-05,
+      "learning_rate": 0.0001967362262903305,
+      "loss": 46.0,
+      "step": 128
+    },
+    {
+      "epoch": 0.02176939627895203,
+      "grad_norm": 0.00012708237045444548,
+      "learning_rate": 0.00019668098303486593,
+      "loss": 46.0,
+      "step": 129
+    },
+    {
+      "epoch": 0.02193815128886639,
+      "grad_norm": 5.1937749958597124e-05,
+      "learning_rate": 0.00019662528405364947,
+      "loss": 46.0,
+      "step": 130
+    },
+    {
+      "epoch": 0.022106906298780746,
+      "grad_norm": 6.14839227637276e-05,
+      "learning_rate": 0.00019656912960922974,
+      "loss": 46.0,
+      "step": 131
+    },
+    {
+      "epoch": 0.022275661308695103,
+      "grad_norm": 5.0448226829757914e-05,
+      "learning_rate": 0.0001965125199663023,
+      "loss": 46.0,
+      "step": 132
+    },
+    {
+      "epoch": 0.02244441631860946,
+      "grad_norm": 0.00013035547453910112,
+      "learning_rate": 0.0001964554553917084,
+      "loss": 46.0,
+      "step": 133
+    },
+    {
+      "epoch": 0.022613171328523817,
+      "grad_norm": 5.22616392117925e-05,
+      "learning_rate": 0.00019639793615443366,
+      "loss": 46.0,
+      "step": 134
+    },
+    {
+      "epoch": 0.02278192633843817,
+      "grad_norm": 7.795252167852595e-05,
+      "learning_rate": 0.00019633996252560687,
+      "loss": 46.0,
+      "step": 135
+    },
+    {
+      "epoch": 0.022950681348352528,
+      "grad_norm": 0.0001024070952553302,
+      "learning_rate": 0.00019628153477849867,
+      "loss": 46.0,
+      "step": 136
+    },
+    {
+      "epoch": 0.023119436358266885,
+      "grad_norm": 6.724517152179033e-05,
+      "learning_rate": 0.00019622265318852033,
+      "loss": 46.0,
+      "step": 137
+    },
+    {
+      "epoch": 0.023288191368181242,
+      "grad_norm": 6.319572275970131e-05,
+      "learning_rate": 0.00019616331803322236,
+      "loss": 46.0,
+      "step": 138
+    },
+    {
+      "epoch": 0.0234569463780956,
+      "grad_norm": 6.086541907279752e-05,
+      "learning_rate": 0.0001961035295922932,
+      "loss": 46.0,
+      "step": 139
+    },
+    {
+      "epoch": 0.023625701388009956,
+      "grad_norm": 5.147139745531604e-05,
+      "learning_rate": 0.00019604328814755808,
+      "loss": 46.0,
+      "step": 140
+    },
+    {
+      "epoch": 0.023794456397924314,
+      "grad_norm": 6.334174395306036e-05,
+      "learning_rate": 0.0001959825939829774,
+      "loss": 46.0,
+      "step": 141
+    },
+    {
+      "epoch": 0.02396321140783867,
+      "grad_norm": 8.245484787039459e-05,
+      "learning_rate": 0.00019592144738464566,
+      "loss": 46.0,
+      "step": 142
+    },
+    {
+      "epoch": 0.024131966417753028,
+      "grad_norm": 3.768013630178757e-05,
+      "learning_rate": 0.00019585984864078996,
+      "loss": 46.0,
+      "step": 143
+    },
+    {
+      "epoch": 0.024300721427667385,
+      "grad_norm": 5.31747609784361e-05,
+      "learning_rate": 0.0001957977980417687,
+      "loss": 46.0,
+      "step": 144
+    },
+    {
+      "epoch": 0.024469476437581742,
+      "grad_norm": 6.340059917420149e-05,
+      "learning_rate": 0.00019573529588007011,
+      "loss": 46.0,
+      "step": 145
+    },
+    {
+      "epoch": 0.0246382314474961,
+      "grad_norm": 8.484098361805081e-05,
+      "learning_rate": 0.00019567234245031106,
+      "loss": 46.0,
+      "step": 146
+    },
+    {
+      "epoch": 0.024806986457410453,
+      "grad_norm": 7.689618360018358e-05,
+      "learning_rate": 0.00019560893804923554,
+      "loss": 46.0,
+      "step": 147
+    },
+    {
+      "epoch": 0.02497574146732481,
+      "grad_norm": 6.701362144667655e-05,
+      "learning_rate": 0.00019554508297571328,
+      "loss": 46.0,
+      "step": 148
+    },
+    {
+      "epoch": 0.025144496477239167,
+      "grad_norm": 4.832363629247993e-05,
+      "learning_rate": 0.00019548077753073827,
+      "loss": 46.0,
+      "step": 149
+    },
+    {
+      "epoch": 0.025313251487153524,
+      "grad_norm": 0.00010464687511557713,
+      "learning_rate": 0.00019541602201742755,
+      "loss": 46.0,
+      "step": 150
+    },
+    {
+      "epoch": 0.02548200649706788,
+      "grad_norm": 5.5594293371541426e-05,
+      "learning_rate": 0.00019535081674101955,
+      "loss": 46.0,
+      "step": 151
+    },
+    {
+      "epoch": 0.02565076150698224,
+      "grad_norm": 0.00010436464071972296,
+      "learning_rate": 0.0001952851620088728,
+      "loss": 46.0,
+      "step": 152
+    },
+    {
+      "epoch": 0.025819516516896596,
+      "grad_norm": 7.730885408818722e-05,
+      "learning_rate": 0.00019521905813046445,
+      "loss": 46.0,
+      "step": 153
+    },
+    {
+      "epoch": 0.025988271526810953,
+      "grad_norm": 0.0002586382324807346,
+      "learning_rate": 0.00019515250541738872,
+      "loss": 46.0,
+      "step": 154
+    },
+    {
+      "epoch": 0.02615702653672531,
+      "grad_norm": 5.2660256187664345e-05,
+      "learning_rate": 0.00019508550418335555,
+      "loss": 46.0,
+      "step": 155
+    },
+    {
+      "epoch": 0.026325781546639667,
+      "grad_norm": 6.58825520076789e-05,
+      "learning_rate": 0.00019501805474418912,
+      "loss": 46.0,
+      "step": 156
+    },
+    {
+      "epoch": 0.026494536556554024,
+      "grad_norm": 9.116072760662064e-05,
+      "learning_rate": 0.00019495015741782622,
+      "loss": 46.0,
+      "step": 157
+    },
+    {
+      "epoch": 0.026663291566468378,
+      "grad_norm": 0.00010203113924944773,
+      "learning_rate": 0.00019488181252431489,
+      "loss": 46.0,
+      "step": 158
+    },
+    {
+      "epoch": 0.026832046576382735,
+      "grad_norm": 9.367840539198369e-05,
+      "learning_rate": 0.00019481302038581294,
+      "loss": 46.0,
+      "step": 159
+    },
+    {
+      "epoch": 0.027000801586297092,
+      "grad_norm": 5.867075742571615e-05,
+      "learning_rate": 0.00019474378132658626,
+      "loss": 46.0,
+      "step": 160
+    },
+    {
+      "epoch": 0.02716955659621145,
+      "grad_norm": 0.0001331541279796511,
+      "learning_rate": 0.00019467409567300745,
+      "loss": 46.0,
+      "step": 161
+    },
+    {
+      "epoch": 0.027338311606125806,
+      "grad_norm": 9.494357800576836e-05,
+      "learning_rate": 0.0001946039637535542,
+      "loss": 46.0,
+      "step": 162
+    },
+    {
+      "epoch": 0.027507066616040163,
+      "grad_norm": 0.00018060464935842901,
+      "learning_rate": 0.0001945333858988078,
+      "loss": 46.0,
+      "step": 163
+    },
+    {
+      "epoch": 0.02767582162595452,
+      "grad_norm": 9.109014354180545e-05,
+      "learning_rate": 0.0001944623624414515,
+      "loss": 46.0,
+      "step": 164
+    },
+    {
+      "epoch": 0.027844576635868878,
+      "grad_norm": 0.00021458794071804732,
+      "learning_rate": 0.00019439089371626903,
+      "loss": 46.0,
+      "step": 165
+    },
+    {
+      "epoch": 0.028013331645783235,
+      "grad_norm": 0.00023161491844803095,
+      "learning_rate": 0.0001943189800601429,
+      "loss": 46.0,
+      "step": 166
+    },
+    {
+      "epoch": 0.028182086655697592,
+      "grad_norm": 0.0001091673257178627,
+      "learning_rate": 0.00019424662181205307,
+      "loss": 46.0,
+      "step": 167
+    },
+    {
+      "epoch": 0.02835084166561195,
+      "grad_norm": 9.839528502197936e-05,
+      "learning_rate": 0.00019417381931307497,
+      "loss": 46.0,
+      "step": 168
+    },
+    {
+      "epoch": 0.028519596675526306,
+      "grad_norm": 0.0001077549095498398,
+      "learning_rate": 0.00019410057290637824,
+      "loss": 46.0,
+      "step": 169
+    },
+    {
+      "epoch": 0.02868835168544066,
+      "grad_norm": 0.00011080451076850295,
+      "learning_rate": 0.0001940268829372249,
+      "loss": 46.0,
+      "step": 170
+    },
+    {
+      "epoch": 0.028857106695355017,
+      "grad_norm": 0.00010105837282026187,
+      "learning_rate": 0.00019395274975296786,
+      "loss": 46.0,
+      "step": 171
+    },
+    {
+      "epoch": 0.029025861705269374,
+      "grad_norm": 0.00012236724433023483,
+      "learning_rate": 0.0001938781737030491,
+      "loss": 46.0,
+      "step": 172
+    },
+    {
+      "epoch": 0.02919461671518373,
+      "grad_norm": 8.416602213401347e-05,
+      "learning_rate": 0.00019380315513899826,
+      "loss": 46.0,
+      "step": 173
+    },
+    {
+      "epoch": 0.02936337172509809,
+      "grad_norm": 0.00017547875177115202,
+      "learning_rate": 0.00019372769441443083,
+      "loss": 46.0,
+      "step": 174
+    },
+    {
+      "epoch": 0.029532126735012446,
+      "grad_norm": 0.00010037582251243293,
+      "learning_rate": 0.00019365179188504647,
+      "loss": 46.0,
+      "step": 175
+    },
+    {
+      "epoch": 0.029700881744926803,
+      "grad_norm": 0.0001204924556077458,
+      "learning_rate": 0.0001935754479086274,
+      "loss": 46.0,
+      "step": 176
+    },
+    {
+      "epoch": 0.02986963675484116,
+      "grad_norm": 0.00014140504936221987,
+      "learning_rate": 0.00019349866284503674,
+      "loss": 46.0,
+      "step": 177
+    },
+    {
+      "epoch": 0.030038391764755517,
+      "grad_norm": 9.342600969830528e-05,
+      "learning_rate": 0.00019342143705621662,
+      "loss": 46.0,
+      "step": 178
+    },
+    {
+      "epoch": 0.030207146774669874,
+      "grad_norm": 4.463369259610772e-05,
+      "learning_rate": 0.00019334377090618682,
+      "loss": 46.0,
+      "step": 179
+    },
+    {
+      "epoch": 0.03037590178458423,
+      "grad_norm": 8.116533717839047e-05,
+      "learning_rate": 0.00019326566476104274,
+      "loss": 46.0,
+      "step": 180
+    },
+    {
+      "epoch": 0.03054465679449859,
+      "grad_norm": 0.00013790494995191693,
+      "learning_rate": 0.00019318711898895377,
+      "loss": 46.0,
+      "step": 181
+    },
+    {
+      "epoch": 0.030713411804412942,
+      "grad_norm": 0.0002199001028202474,
+      "learning_rate": 0.00019310813396016162,
+      "loss": 46.0,
+      "step": 182
+    },
+    {
+      "epoch": 0.0308821668143273,
+      "grad_norm": 0.0002289148687850684,
+      "learning_rate": 0.0001930287100469785,
+      "loss": 46.0,
+      "step": 183
+    },
+    {
+      "epoch": 0.031050921824241656,
+      "grad_norm": 0.00022609223378822207,
+      "learning_rate": 0.00019294884762378547,
+      "loss": 46.0,
+      "step": 184
+    },
+    {
+      "epoch": 0.031219676834156013,
+      "grad_norm": 0.00014787810505367815,
+      "learning_rate": 0.00019286854706703044,
+      "loss": 46.0,
+      "step": 185
+    },
+    {
+      "epoch": 0.03138843184407037,
+      "grad_norm": 0.00017034618940670043,
+      "learning_rate": 0.00019278780875522667,
+      "loss": 46.0,
+      "step": 186
+    },
+    {
+      "epoch": 0.03155718685398473,
+      "grad_norm": 0.0001577001967234537,
+      "learning_rate": 0.0001927066330689509,
+      "loss": 46.0,
+      "step": 187
+    },
+    {
+      "epoch": 0.031725941863899085,
+      "grad_norm": 0.0001635671651456505,
+      "learning_rate": 0.0001926250203908414,
+      "loss": 46.0,
+      "step": 188
+    },
+    {
+      "epoch": 0.03189469687381344,
+      "grad_norm": 0.00011218619329156354,
+      "learning_rate": 0.00019254297110559638,
+      "loss": 46.0,
+      "step": 189
+    },
+    {
+      "epoch": 0.0320634518837278,
+      "grad_norm": 0.0001787557266652584,
+      "learning_rate": 0.0001924604855999721,
+      "loss": 46.0,
+      "step": 190
+    },
+    {
+      "epoch": 0.03223220689364215,
+      "grad_norm": 0.00014260809984989464,
+      "learning_rate": 0.00019237756426278095,
+      "loss": 46.0,
+      "step": 191
+    },
+    {
+      "epoch": 0.03240096190355651,
+      "grad_norm": 0.00012893076927866787,
+      "learning_rate": 0.00019229420748488978,
+      "loss": 46.0,
+      "step": 192
+    },
+    {
+      "epoch": 0.03256971691347087,
+      "grad_norm": 0.00022735691163688898,
+      "learning_rate": 0.00019221041565921796,
+      "loss": 46.0,
+      "step": 193
+    },
+    {
+      "epoch": 0.03273847192338523,
+      "grad_norm": 0.00011990263010375202,
+      "learning_rate": 0.0001921261891807355,
+      "loss": 46.0,
+      "step": 194
+    },
+    {
+      "epoch": 0.03290722693329958,
+      "grad_norm": 0.00017182013834826648,
+      "learning_rate": 0.00019204152844646134,
+      "loss": 46.0,
+      "step": 195
+    },
+    {
+      "epoch": 0.03307598194321394,
+      "grad_norm": 0.00017154582019429654,
+      "learning_rate": 0.00019195643385546126,
+      "loss": 46.0,
+      "step": 196
+    },
+    {
+      "epoch": 0.033244736953128295,
+      "grad_norm": 0.0001479845232097432,
+      "learning_rate": 0.00019187090580884622,
+      "loss": 46.0,
+      "step": 197
+    },
+    {
+      "epoch": 0.033413491963042656,
+      "grad_norm": 0.00010758084681583568,
+      "learning_rate": 0.00019178494470977023,
+      "loss": 46.0,
+      "step": 198
+    },
+    {
+      "epoch": 0.03358224697295701,
+      "grad_norm": 0.0001167198788607493,
+      "learning_rate": 0.0001916985509634287,
+      "loss": 46.0,
+      "step": 199
+    },
+    {
+      "epoch": 0.03375100198287136,
+      "grad_norm": 0.00015376460214611143,
+      "learning_rate": 0.00019161172497705637,
+      "loss": 46.0,
+      "step": 200
+    },
+    {
+      "epoch": 0.033919756992785724,
+      "grad_norm": 0.0001339185400865972,
+      "learning_rate": 0.00019152446715992543,
+      "loss": 46.0,
+      "step": 201
+    },
+    {
+      "epoch": 0.03408851200270008,
+      "grad_norm": 0.00018876604735851288,
+      "learning_rate": 0.0001914367779233436,
+      "loss": 46.0,
+      "step": 202
+    },
+    {
+      "epoch": 0.03425726701261444,
+      "grad_norm": 0.00017353007569909096,
+      "learning_rate": 0.00019134865768065216,
+      "loss": 46.0,
+      "step": 203
+    },
+    {
+      "epoch": 0.03442602202252879,
+      "grad_norm": 0.00011807784903794527,
+      "learning_rate": 0.00019126010684722406,
+      "loss": 46.0,
+      "step": 204
+    },
+    {
+      "epoch": 0.03459477703244315,
+      "grad_norm": 0.00010566677519818768,
+      "learning_rate": 0.00019117112584046193,
+      "loss": 46.0,
+      "step": 205
+    },
+    {
+      "epoch": 0.034763532042357506,
+      "grad_norm": 0.0001312542735831812,
+      "learning_rate": 0.00019108171507979606,
+      "loss": 46.0,
+      "step": 206
+    },
+    {
+      "epoch": 0.03493228705227187,
+      "grad_norm": 5.670605969498865e-05,
+      "learning_rate": 0.00019099187498668256,
+      "loss": 46.0,
+      "step": 207
+    },
+    {
+      "epoch": 0.03510104206218622,
+      "grad_norm": 8.242291369242594e-05,
+      "learning_rate": 0.0001909016059846012,
+      "loss": 46.0,
+      "step": 208
+    },
+    {
+      "epoch": 0.03526979707210058,
+      "grad_norm": 0.0001418525935150683,
+      "learning_rate": 0.00019081090849905355,
+      "loss": 46.0,
+      "step": 209
+    },
+    {
+      "epoch": 0.035438552082014935,
+      "grad_norm": 0.0002694391005206853,
+      "learning_rate": 0.00019071978295756087,
+      "loss": 46.0,
+      "step": 210
+    },
+    {
+      "epoch": 0.03560730709192929,
+      "grad_norm": 0.00015816248196642846,
+      "learning_rate": 0.0001906282297896623,
+      "loss": 46.0,
+      "step": 211
+    },
+    {
+      "epoch": 0.03577606210184365,
+      "grad_norm": 0.00011155927495565265,
+      "learning_rate": 0.00019053624942691247,
+      "loss": 46.0,
+      "step": 212
+    },
+    {
+      "epoch": 0.035944817111758,
+      "grad_norm": 0.00010293432569596916,
+      "learning_rate": 0.0001904438423028798,
+      "loss": 46.0,
+      "step": 213
+    },
+    {
+      "epoch": 0.03611357212167236,
+      "grad_norm": 0.00017549478798173368,
+      "learning_rate": 0.00019035100885314438,
+      "loss": 46.0,
+      "step": 214
+    },
+    {
+      "epoch": 0.03628232713158672,
+      "grad_norm": 8.048818563111126e-05,
+      "learning_rate": 0.0001902577495152958,
+      "loss": 46.0,
+      "step": 215
+    },
+    {
+      "epoch": 0.03645108214150108,
+      "grad_norm": 7.043426012387499e-05,
+      "learning_rate": 0.0001901640647289312,
+      "loss": 46.0,
+      "step": 216
+    },
+    {
+      "epoch": 0.03661983715141543,
+      "grad_norm": 0.00022185473062563688,
+      "learning_rate": 0.00019006995493565305,
+      "loss": 46.0,
+      "step": 217
+    },
+    {
+      "epoch": 0.03678859216132979,
+      "grad_norm": 0.0002446068392600864,
+      "learning_rate": 0.0001899754205790674,
+      "loss": 46.0,
+      "step": 218
+    },
+    {
+      "epoch": 0.036957347171244145,
+      "grad_norm": 0.0002539333945605904,
+      "learning_rate": 0.00018988046210478132,
+      "loss": 46.0,
+      "step": 219
+    },
+    {
+      "epoch": 0.037126102181158506,
+      "grad_norm": 0.0001403048081556335,
+      "learning_rate": 0.00018978507996040124,
+      "loss": 46.0,
+      "step": 220
+    },
+    {
+      "epoch": 0.03729485719107286,
+      "grad_norm": 0.00014873422333039343,
+      "learning_rate": 0.00018968927459553055,
+      "loss": 46.0,
+      "step": 221
+    },
+    {
+      "epoch": 0.03746361220098722,
+      "grad_norm": 0.00015955405251588672,
+      "learning_rate": 0.00018959304646176754,
+      "loss": 46.0,
+      "step": 222
+    },
+    {
+      "epoch": 0.037632367210901574,
+      "grad_norm": 0.0003200356150045991,
+      "learning_rate": 0.00018949639601270347,
+      "loss": 46.0,
+      "step": 223
+    },
+    {
+      "epoch": 0.03780112222081593,
+      "grad_norm": 0.00014675638522021472,
+      "learning_rate": 0.00018939932370392004,
+      "loss": 46.0,
+      "step": 224
+    },
+    {
+      "epoch": 0.03796987723073029,
+      "grad_norm": 0.00022176875791046768,
+      "learning_rate": 0.00018930182999298768,
+      "loss": 46.0,
+      "step": 225
+    },
+    {
+      "epoch": 0.03813863224064464,
+      "grad_norm": 0.00029244759934954345,
+      "learning_rate": 0.0001892039153394631,
+      "loss": 46.0,
+      "step": 226
+    },
+    {
+      "epoch": 0.038307387250559,
+      "grad_norm": 0.00019421910110395402,
+      "learning_rate": 0.0001891055802048872,
+      "loss": 46.0,
+      "step": 227
+    },
+    {
+      "epoch": 0.038476142260473356,
+      "grad_norm": 0.00012899210560135543,
+      "learning_rate": 0.00018900682505278287,
+      "loss": 46.0,
+      "step": 228
+    },
+    {
+      "epoch": 0.03864489727038772,
+      "grad_norm": 0.00016150598821695894,
+      "learning_rate": 0.00018890765034865295,
+      "loss": 46.0,
+      "step": 229
+    },
+    {
+      "epoch": 0.03881365228030207,
+      "grad_norm": 0.0004213732318021357,
+      "learning_rate": 0.00018880805655997784,
+      "loss": 46.0,
+      "step": 230
+    },
+    {
+      "epoch": 0.03898240729021643,
+      "grad_norm": 0.0001324907352682203,
+      "learning_rate": 0.0001887080441562134,
+      "loss": 46.0,
+      "step": 231
+    },
+    {
+      "epoch": 0.039151162300130785,
+      "grad_norm": 0.00029545003781095147,
+      "learning_rate": 0.0001886076136087887,
+      "loss": 46.0,
+      "step": 232
+    },
+    {
+      "epoch": 0.039319917310045145,
+      "grad_norm": 0.00018010212806984782,
+      "learning_rate": 0.00018850676539110386,
+      "loss": 46.0,
+      "step": 233
+    },
+    {
+      "epoch": 0.0394886723199595,
+      "grad_norm": 0.00014782045036554337,
+      "learning_rate": 0.00018840549997852776,
+      "loss": 46.0,
+      "step": 234
+    },
+    {
+      "epoch": 0.03965742732987385,
+      "grad_norm": 0.0002910486946348101,
+      "learning_rate": 0.0001883038178483958,
+      "loss": 46.0,
+      "step": 235
+    },
+    {
+      "epoch": 0.03982618233978821,
+      "grad_norm": 0.00010300084977643564,
+      "learning_rate": 0.00018820171948000764,
+      "loss": 46.0,
+      "step": 236
+    },
+    {
+      "epoch": 0.03999493734970257,
+      "grad_norm": 0.00029963982524350286,
+      "learning_rate": 0.00018809920535462502,
+      "loss": 46.0,
+      "step": 237
+    },
+    {
+      "epoch": 0.04016369235961693,
+      "grad_norm": 0.0002475226647220552,
+      "learning_rate": 0.00018799627595546942,
+      "loss": 46.0,
+      "step": 238
+    },
+    {
+      "epoch": 0.04033244736953128,
+      "grad_norm": 9.632138244342059e-05,
+      "learning_rate": 0.00018789293176771978,
+      "loss": 46.0,
+      "step": 239
+    },
+    {
+      "epoch": 0.04050120237944564,
+      "grad_norm": 0.00018374540377408266,
+      "learning_rate": 0.00018778917327851025,
+      "loss": 46.0,
+      "step": 240
+    },
+    {
+      "epoch": 0.040669957389359995,
+      "grad_norm": 0.0005576743860729039,
+      "learning_rate": 0.00018768500097692784,
+      "loss": 46.0,
+      "step": 241
+    },
+    {
+      "epoch": 0.040838712399274356,
+      "grad_norm": 0.0002332236763322726,
+      "learning_rate": 0.00018758041535401018,
+      "loss": 46.0,
+      "step": 242
+    },
+    {
+      "epoch": 0.04100746740918871,
+      "grad_norm": 0.00021743084653280675,
+      "learning_rate": 0.00018747541690274325,
+      "loss": 46.0,
+      "step": 243
+    },
+    {
+      "epoch": 0.04117622241910307,
+      "grad_norm": 0.00037613531458191574,
+      "learning_rate": 0.00018737000611805877,
+      "loss": 46.0,
+      "step": 244
+    },
+    {
+      "epoch": 0.041344977429017424,
+      "grad_norm": 0.00017969420878216624,
+      "learning_rate": 0.00018726418349683231,
+      "loss": 46.0,
+      "step": 245
+    },
+    {
+      "epoch": 0.04151373243893178,
+      "grad_norm": 0.00021221920906100422,
+      "learning_rate": 0.00018715794953788059,
+      "loss": 46.0,
+      "step": 246
+    },
+    {
+      "epoch": 0.04168248744884614,
+      "grad_norm": 0.0002182903845096007,
+      "learning_rate": 0.0001870513047419593,
+      "loss": 46.0,
+      "step": 247
+    },
+    {
+      "epoch": 0.04185124245876049,
+      "grad_norm": 0.00023534568026661873,
+      "learning_rate": 0.00018694424961176065,
+      "loss": 46.0,
+      "step": 248
+    },
+    {
+      "epoch": 0.04201999746867485,
+      "grad_norm": 0.00013651238987222314,
+      "learning_rate": 0.00018683678465191108,
+      "loss": 46.0,
+      "step": 249
+    },
+    {
+      "epoch": 0.042188752478589206,
+      "grad_norm": 0.000251735036727041,
+      "learning_rate": 0.00018672891036896884,
+      "loss": 46.0,
+      "step": 250
+    },
+    {
+      "epoch": 0.04235750748850357,
+      "grad_norm": 0.0005651676910929382,
+      "learning_rate": 0.00018662062727142165,
+      "loss": 46.0,
+      "step": 251
+    },
+    {
+      "epoch": 0.04252626249841792,
+      "grad_norm": 0.00027543309261091053,
+      "learning_rate": 0.00018651193586968417,
+      "loss": 46.0,
+      "step": 252
+    },
+    {
+      "epoch": 0.04269501750833228,
+      "grad_norm": 0.00025643999106250703,
+      "learning_rate": 0.00018640283667609574,
+      "loss": 46.0,
+      "step": 253
+    },
+    {
+      "epoch": 0.042863772518246634,
+      "grad_norm": 0.00026899727527052164,
+      "learning_rate": 0.00018629333020491796,
+      "loss": 46.0,
+      "step": 254
+    },
+    {
+      "epoch": 0.043032527528160995,
+      "grad_norm": 0.0002454131608828902,
+      "learning_rate": 0.00018618341697233213,
+      "loss": 46.0,
+      "step": 255
+    },
+    {
+      "epoch": 0.04320128253807535,
+      "grad_norm": 0.00020008228602819145,
+      "learning_rate": 0.0001860730974964369,
+      "loss": 46.0,
+      "step": 256
+    },
+    {
+      "epoch": 0.04337003754798971,
+      "grad_norm": 0.0003180755884386599,
+      "learning_rate": 0.00018596237229724595,
+      "loss": 46.0,
+      "step": 257
+    },
+    {
+      "epoch": 0.04353879255790406,
+      "grad_norm": 0.00027137139113619924,
+      "learning_rate": 0.0001858512418966853,
+      "loss": 46.0,
+      "step": 258
+    },
+    {
+      "epoch": 0.04370754756781842,
+      "grad_norm": 0.0003248886205255985,
+      "learning_rate": 0.000185739706818591,
+      "loss": 46.0,
+      "step": 259
+    },
+    {
+      "epoch": 0.04387630257773278,
+      "grad_norm": 0.00022937578614801168,
+      "learning_rate": 0.00018562776758870663,
+      "loss": 46.0,
+      "step": 260
+    },
+    {
+      "epoch": 0.04404505758764713,
+      "grad_norm": 0.00026010029250755906,
+      "learning_rate": 0.0001855154247346809,
+      "loss": 46.0,
+      "step": 261
+    },
+    {
+      "epoch": 0.04421381259756149,
+      "grad_norm": 0.00012881477596238256,
+      "learning_rate": 0.00018540267878606497,
+      "loss": 46.0,
+      "step": 262
+    },
+    {
+      "epoch": 0.044382567607475845,
+      "grad_norm": 0.0001669221237534657,
+      "learning_rate": 0.0001852895302743101,
+      "loss": 46.0,
+      "step": 263
+    },
+    {
+      "epoch": 0.044551322617390206,
+      "grad_norm": 8.989863272290677e-05,
+      "learning_rate": 0.0001851759797327652,
+      "loss": 46.0,
+      "step": 264
+    },
+    {
+      "epoch": 0.04472007762730456,
+      "grad_norm": 0.00043415901018306613,
+      "learning_rate": 0.00018506202769667413,
+      "loss": 46.0,
+      "step": 265
+    },
+    {
+      "epoch": 0.04488883263721892,
+      "grad_norm": 0.00018308595463167876,
+      "learning_rate": 0.00018494767470317333,
+      "loss": 46.0,
+      "step": 266
+    },
+    {
+      "epoch": 0.045057587647133274,
+      "grad_norm": 0.0005433953483588994,
+      "learning_rate": 0.00018483292129128914,
+      "loss": 46.0,
+      "step": 267
+    },
+    {
+      "epoch": 0.045226342657047634,
+      "grad_norm": 0.0001997397339437157,
+      "learning_rate": 0.00018471776800193553,
+      "loss": 46.0,
+      "step": 268
+    },
+    {
+      "epoch": 0.04539509766696199,
+      "grad_norm": 0.0002549294731579721,
+      "learning_rate": 0.00018460221537791122,
+      "loss": 46.0,
+      "step": 269
+    },
+    {
+      "epoch": 0.04556385267687634,
+      "grad_norm": 0.0004634494544006884,
+      "learning_rate": 0.00018448626396389738,
+      "loss": 46.0,
+      "step": 270
+    },
+    {
+      "epoch": 0.0457326076867907,
+      "grad_norm": 0.00015057041309773922,
+      "learning_rate": 0.00018436991430645488,
+      "loss": 46.0,
+      "step": 271
+    },
+    {
+      "epoch": 0.045901362696705056,
+      "grad_norm": 0.00030916737159714103,
+      "learning_rate": 0.00018425316695402181,
+      "loss": 46.0,
+      "step": 272
+    },
+    {
+      "epoch": 0.046070117706619416,
+      "grad_norm": 0.0003056859422940761,
+      "learning_rate": 0.00018413602245691092,
+      "loss": 46.0,
+      "step": 273
+    },
+    {
+      "epoch": 0.04623887271653377,
+      "grad_norm": 0.00028438231674954295,
+      "learning_rate": 0.00018401848136730698,
+      "loss": 46.0,
+      "step": 274
+    },
+    {
+      "epoch": 0.04640762772644813,
+      "grad_norm": 0.00034854307887144387,
+      "learning_rate": 0.00018390054423926406,
+      "loss": 46.0,
+      "step": 275
+    },
+    {
+      "epoch": 0.046576382736362484,
+      "grad_norm": 0.00025173407630063593,
+      "learning_rate": 0.00018378221162870326,
+      "loss": 46.0,
+      "step": 276
+    },
+    {
+      "epoch": 0.046745137746276845,
+      "grad_norm": 0.0006683963001705706,
+      "learning_rate": 0.00018366348409340965,
+      "loss": 46.0,
+      "step": 277
+    },
+    {
+      "epoch": 0.0469138927561912,
+      "grad_norm": 0.00017825645045377314,
+      "learning_rate": 0.00018354436219303,
+      "loss": 46.0,
+      "step": 278
+    },
+    {
+      "epoch": 0.04708264776610556,
+      "grad_norm": 0.00026973988860845566,
+      "learning_rate": 0.00018342484648906996,
+      "loss": 46.0,
+      "step": 279
+    },
+    {
+      "epoch": 0.04725140277601991,
+      "grad_norm": 0.00027474435046315193,
+      "learning_rate": 0.00018330493754489138,
+      "loss": 46.0,
+      "step": 280
+    },
+    {
+      "epoch": 0.04742015778593427,
+      "grad_norm": 0.00025235096109099686,
+      "learning_rate": 0.00018318463592570988,
+      "loss": 46.0,
+      "step": 281
+    },
+    {
+      "epoch": 0.04758891279584863,
+      "grad_norm": 0.00020448811119422317,
+      "learning_rate": 0.0001830639421985919,
+      "loss": 46.0,
+      "step": 282
+    },
+    {
+      "epoch": 0.04775766780576298,
+      "grad_norm": 0.00028648230363614857,
+      "learning_rate": 0.00018294285693245223,
+      "loss": 46.0,
+      "step": 283
+    },
+    {
+      "epoch": 0.04792642281567734,
+      "grad_norm": 0.00027214092551730573,
+      "learning_rate": 0.00018282138069805127,
+      "loss": 46.0,
+      "step": 284
+    },
+    {
+      "epoch": 0.048095177825591695,
+      "grad_norm": 0.00021831200865563005,
+      "learning_rate": 0.00018269951406799223,
+      "loss": 46.0,
+      "step": 285
+    },
+    {
+      "epoch": 0.048263932835506056,
+      "grad_norm": 0.000336469296598807,
+      "learning_rate": 0.00018257725761671866,
+      "loss": 46.0,
+      "step": 286
+    },
+    {
+      "epoch": 0.04843268784542041,
+      "grad_norm": 0.00034162108204327524,
+      "learning_rate": 0.00018245461192051157,
+      "loss": 46.0,
+      "step": 287
+    },
+    {
+      "epoch": 0.04860144285533477,
+      "grad_norm": 0.00035298787406645715,
+      "learning_rate": 0.00018233157755748669,
+      "loss": 46.0,
+      "step": 288
+    },
+    {
+      "epoch": 0.048770197865249124,
+      "grad_norm": 0.00015676271868869662,
+      "learning_rate": 0.0001822081551075919,
+      "loss": 46.0,
+      "step": 289
+    },
+    {
+      "epoch": 0.048938952875163484,
+      "grad_norm": 0.00023883357062004507,
+      "learning_rate": 0.0001820843451526044,
+      "loss": 46.0,
+      "step": 290
+    },
+    {
+      "epoch": 0.04910770788507784,
+      "grad_norm": 0.0002669495588634163,
+      "learning_rate": 0.0001819601482761278,
+      "loss": 46.0,
+      "step": 291
+    },
+    {
+      "epoch": 0.0492764628949922,
+      "grad_norm": 0.00034910603426396847,
+      "learning_rate": 0.0001818355650635899,
+      "loss": 46.0,
+      "step": 292
+    },
+    {
+      "epoch": 0.04944521790490655,
+      "grad_norm": 0.00024713424500077963,
+      "learning_rate": 0.0001817105961022392,
+      "loss": 46.0,
+      "step": 293
+    },
+    {
+      "epoch": 0.049613972914820906,
+      "grad_norm": 0.0002389974833931774,
+      "learning_rate": 0.00018158524198114278,
+      "loss": 46.0,
+      "step": 294
+    },
+    {
+      "epoch": 0.049782727924735266,
+      "grad_norm": 0.00028152199229225516,
+      "learning_rate": 0.0001814595032911831,
+      "loss": 46.0,
+      "step": 295
+    },
+    {
+      "epoch": 0.04995148293464962,
+      "grad_norm": 0.0004107706481590867,
+      "learning_rate": 0.00018133338062505534,
+      "loss": 46.0,
+      "step": 296
+    },
+    {
+      "epoch": 0.05012023794456398,
+      "grad_norm": 0.00032018640195019543,
+      "learning_rate": 0.00018120687457726478,
+      "loss": 46.0,
+      "step": 297
+    },
+    {
+      "epoch": 0.050288992954478334,
+      "grad_norm": 0.0003432031662669033,
+      "learning_rate": 0.00018107998574412376,
+      "loss": 46.0,
+      "step": 298
+    },
+    {
+      "epoch": 0.050457747964392695,
+      "grad_norm": 0.00042492407374083996,
+      "learning_rate": 0.00018095271472374892,
+      "loss": 46.0,
+      "step": 299
+    },
+    {
+      "epoch": 0.05062650297430705,
+      "grad_norm": 0.0004913901793770492,
+      "learning_rate": 0.00018082506211605852,
+      "loss": 46.0,
+      "step": 300
+    },
+    {
+      "epoch": 0.05079525798422141,
+      "grad_norm": 0.00045284689986146986,
+      "learning_rate": 0.00018069702852276941,
+      "loss": 46.0,
+      "step": 301
+    },
+    {
+      "epoch": 0.05096401299413576,
+      "grad_norm": 0.0002827131829690188,
+      "learning_rate": 0.00018056861454739432,
+      "loss": 46.0,
+      "step": 302
+    },
+    {
+      "epoch": 0.05113276800405012,
+      "grad_norm": 0.000371127447579056,
+      "learning_rate": 0.00018043982079523905,
+      "loss": 46.0,
+      "step": 303
+    },
+    {
+      "epoch": 0.05130152301396448,
+      "grad_norm": 0.0001193537755170837,
+      "learning_rate": 0.00018031064787339947,
+      "loss": 46.0,
+      "step": 304
+    },
+    {
+      "epoch": 0.05147027802387883,
+      "grad_norm": 0.0002617594145704061,
+      "learning_rate": 0.00018018109639075886,
+      "loss": 46.0,
+      "step": 305
+    },
+    {
+      "epoch": 0.05163903303379319,
+      "grad_norm": 0.0003127566596958786,
+      "learning_rate": 0.00018005116695798476,
+      "loss": 46.0,
+      "step": 306
+    },
+    {
+      "epoch": 0.051807788043707545,
+      "grad_norm": 0.0003426918410696089,
+      "learning_rate": 0.00017992086018752638,
+      "loss": 46.0,
+      "step": 307
+    },
+    {
+      "epoch": 0.051976543053621906,
+      "grad_norm": 0.000528005592059344,
+      "learning_rate": 0.0001797901766936116,
+      "loss": 46.0,
+      "step": 308
+    },
+    {
+      "epoch": 0.05214529806353626,
+      "grad_norm": 0.00044293675455264747,
+      "learning_rate": 0.00017965911709224395,
+      "loss": 46.0,
+      "step": 309
+    },
+    {
+      "epoch": 0.05231405307345062,
+      "grad_norm": 0.0004309279902372509,
+      "learning_rate": 0.00017952768200119992,
+      "loss": 46.0,
+      "step": 310
+    },
+    {
+      "epoch": 0.052482808083364973,
+      "grad_norm": 0.00038651516661047935,
+      "learning_rate": 0.0001793958720400259,
+      "loss": 46.0,
+      "step": 311
+    },
+    {
+      "epoch": 0.052651563093279334,
+      "grad_norm": 0.000503813847899437,
+      "learning_rate": 0.00017926368783003537,
+      "loss": 46.0,
+      "step": 312
+    },
+    {
+      "epoch": 0.05282031810319369,
+      "grad_norm": 0.0007580111850984395,
+      "learning_rate": 0.00017913112999430584,
+      "loss": 46.0,
+      "step": 313
+    },
+    {
+      "epoch": 0.05298907311310805,
+      "grad_norm": 0.0003376993117853999,
+      "learning_rate": 0.00017899819915767598,
+      "loss": 46.0,
+      "step": 314
+    },
+    {
+      "epoch": 0.0531578281230224,
+      "grad_norm": 0.0003070792299695313,
+      "learning_rate": 0.00017886489594674273,
+      "loss": 46.0,
+      "step": 315
+    },
+    {
+      "epoch": 0.053326583132936756,
+      "grad_norm": 0.0006369905895553529,
+      "learning_rate": 0.00017873122098985826,
+      "loss": 46.0,
+      "step": 316
+    },
+    {
+      "epoch": 0.053495338142851116,
+      "grad_norm": 0.0005446787690743804,
+      "learning_rate": 0.00017859717491712707,
+      "loss": 46.0,
+      "step": 317
+    },
+    {
+      "epoch": 0.05366409315276547,
+      "grad_norm": 0.0003436711267568171,
+      "learning_rate": 0.0001784627583604029,
+      "loss": 46.0,
+      "step": 318
+    },
+    {
+      "epoch": 0.05383284816267983,
+      "grad_norm": 0.0004495533648878336,
+      "learning_rate": 0.000178327971953286,
+      "loss": 46.0,
+      "step": 319
+    },
+    {
+      "epoch": 0.054001603172594184,
+      "grad_norm": 0.0004997072974219918,
+      "learning_rate": 0.00017819281633111984,
+      "loss": 46.0,
+      "step": 320
+    },
+    {
+      "epoch": 0.054170358182508545,
+      "grad_norm": 0.00040087226079776883,
+      "learning_rate": 0.0001780572921309883,
+      "loss": 46.0,
+      "step": 321
+    },
+    {
+      "epoch": 0.0543391131924229,
+      "grad_norm": 0.0005316153983585536,
+      "learning_rate": 0.0001779213999917127,
+      "loss": 46.0,
+      "step": 322
+    },
+    {
+      "epoch": 0.05450786820233726,
+      "grad_norm": 0.0006078218575567007,
+      "learning_rate": 0.00017778514055384866,
+      "loss": 46.0,
+      "step": 323
+    },
+    {
+      "epoch": 0.05467662321225161,
+      "grad_norm": 0.0004828522796742618,
+      "learning_rate": 0.00017764851445968308,
+      "loss": 46.0,
+      "step": 324
+    },
+    {
+      "epoch": 0.05484537822216597,
+      "grad_norm": 0.00032997396192513406,
+      "learning_rate": 0.0001775115223532313,
+      "loss": 46.0,
+      "step": 325
+    },
+    {
+      "epoch": 0.05501413323208033,
+      "grad_norm": 0.00033607761724852026,
+      "learning_rate": 0.00017737416488023384,
+      "loss": 46.0,
+      "step": 326
+    },
+    {
+      "epoch": 0.05518288824199469,
+      "grad_norm": 0.0004014720907434821,
+      "learning_rate": 0.00017723644268815344,
+      "loss": 46.0,
+      "step": 327
+    },
+    {
+      "epoch": 0.05535164325190904,
+      "grad_norm": 0.000525740790180862,
+      "learning_rate": 0.00017709835642617212,
+      "loss": 46.0,
+      "step": 328
+    },
+    {
+      "epoch": 0.055520398261823395,
+      "grad_norm": 0.0007568416767753661,
+      "learning_rate": 0.00017695990674518788,
+      "loss": 46.0,
+      "step": 329
+    },
+    {
+      "epoch": 0.055689153271737755,
+      "grad_norm": 0.0004670285852625966,
+      "learning_rate": 0.0001768210942978119,
+      "loss": 46.0,
+      "step": 330
+    },
+    {
+      "epoch": 0.05585790828165211,
+      "grad_norm": 0.0005972511135041714,
+      "learning_rate": 0.00017668191973836529,
+      "loss": 46.0,
+      "step": 331
+    },
+    {
+      "epoch": 0.05602666329156647,
+      "grad_norm": 0.00048381276428699493,
+      "learning_rate": 0.000176542383722876,
+      "loss": 46.0,
+      "step": 332
+    },
+    {
+      "epoch": 0.05619541830148082,
+      "grad_norm": 0.0007428377284668386,
+      "learning_rate": 0.0001764024869090758,
+      "loss": 46.0,
+      "step": 333
+    },
+    {
+      "epoch": 0.056364173311395184,
+      "grad_norm": 0.000608109578024596,
+      "learning_rate": 0.00017626222995639724,
+      "loss": 46.0,
+      "step": 334
+    },
+    {
+      "epoch": 0.05653292832130954,
+      "grad_norm": 0.0006571222911588848,
+      "learning_rate": 0.00017612161352597032,
+      "loss": 46.0,
+      "step": 335
+    },
+    {
+      "epoch": 0.0567016833312239,
+      "grad_norm": 0.0007276976830326021,
+      "learning_rate": 0.00017598063828061958,
+      "loss": 46.0,
+      "step": 336
+    },
+    {
+      "epoch": 0.05687043834113825,
+      "grad_norm": 0.0007646044250577688,
+      "learning_rate": 0.000175839304884861,
+      "loss": 46.0,
+      "step": 337
+    },
+    {
+      "epoch": 0.05703919335105261,
+      "grad_norm": 0.0004487757687456906,
+      "learning_rate": 0.00017569761400489862,
+      "loss": 46.0,
+      "step": 338
+    },
+    {
+      "epoch": 0.057207948360966966,
+      "grad_norm": 0.0004590119933709502,
+      "learning_rate": 0.0001755555663086216,
+      "loss": 46.0,
+      "step": 339
+    },
+    {
+      "epoch": 0.05737670337088132,
+      "grad_norm": 0.00019634263298939914,
+      "learning_rate": 0.0001754131624656011,
+      "loss": 46.0,
+      "step": 340
+    },
+    {
+      "epoch": 0.05754545838079568,
+      "grad_norm": 0.0008665765053592622,
+      "learning_rate": 0.00017527040314708702,
+      "loss": 46.0,
+      "step": 341
+    },
+    {
+      "epoch": 0.057714213390710034,
+      "grad_norm": 0.0010710041970014572,
+      "learning_rate": 0.0001751272890260048,
+      "loss": 46.0,
+      "step": 342
+    },
+    {
+      "epoch": 0.057882968400624395,
+      "grad_norm": 0.0010121267987415195,
+      "learning_rate": 0.0001749838207769524,
+      "loss": 46.0,
+      "step": 343
+    },
+    {
+      "epoch": 0.05805172341053875,
+      "grad_norm": 0.0006279960507526994,
+      "learning_rate": 0.00017483999907619695,
+      "loss": 46.0,
+      "step": 344
+    },
+    {
+      "epoch": 0.05822047842045311,
+      "grad_norm": 0.0006246848497539759,
+      "learning_rate": 0.00017469582460167174,
+      "loss": 46.0,
+      "step": 345
+    },
+    {
+      "epoch": 0.05838923343036746,
+      "grad_norm": 0.000596629804931581,
+      "learning_rate": 0.00017455129803297287,
+      "loss": 46.0,
+      "step": 346
+    },
+    {
+      "epoch": 0.05855798844028182,
+      "grad_norm": 0.0008878617081791162,
+      "learning_rate": 0.00017440642005135614,
+      "loss": 46.0,
+      "step": 347
+    },
+    {
+      "epoch": 0.05872674345019618,
+      "grad_norm": 0.0003601356002036482,
+      "learning_rate": 0.0001742611913397338,
+      "loss": 46.0,
+      "step": 348
+    },
+    {
+      "epoch": 0.05889549846011054,
+      "grad_norm": 0.0003498071455396712,
+      "learning_rate": 0.00017411561258267127,
+      "loss": 46.0,
+      "step": 349
+    },
+    {
+      "epoch": 0.05906425347002489,
+      "grad_norm": 0.0007128751021809876,
+      "learning_rate": 0.0001739696844663841,
+      "loss": 46.0,
+      "step": 350
+    },
+    {
+      "epoch": 0.059233008479939245,
+      "grad_norm": 0.00047719714348204434,
+      "learning_rate": 0.0001738234076787346,
+      "loss": 46.0,
+      "step": 351
+    },
+    {
+      "epoch": 0.059401763489853605,
+      "grad_norm": 0.0006972616538405418,
+      "learning_rate": 0.00017367678290922852,
+      "loss": 46.0,
+      "step": 352
+    },
+    {
+      "epoch": 0.05957051849976796,
+      "grad_norm": 0.0005275082658044994,
+      "learning_rate": 0.00017352981084901194,
+      "loss": 46.0,
+      "step": 353
+    },
+    {
+      "epoch": 0.05973927350968232,
+      "grad_norm": 0.001250717556104064,
+      "learning_rate": 0.000173382492190868,
+      "loss": 46.0,
+      "step": 354
+    },
+    {
+      "epoch": 0.05990802851959667,
+      "grad_norm": 0.0008419329533353448,
+      "learning_rate": 0.00017323482762921354,
+      "loss": 46.0,
+      "step": 355
+    },
+    {
+      "epoch": 0.060076783529511034,
+      "grad_norm": 0.0008670453680679202,
+      "learning_rate": 0.000173086817860096,
+      "loss": 46.0,
+      "step": 356
+    },
+    {
+      "epoch": 0.06024553853942539,
+      "grad_norm": 0.0014900369569659233,
+      "learning_rate": 0.00017293846358118988,
+      "loss": 46.0,
+      "step": 357
+    },
+    {
+      "epoch": 0.06041429354933975,
+      "grad_norm": 0.001586690079420805,
+      "learning_rate": 0.0001727897654917937,
+      "loss": 46.0,
+      "step": 358
+    },
+    {
+      "epoch": 0.0605830485592541,
+      "grad_norm": 0.0009236481855623424,
+      "learning_rate": 0.00017264072429282656,
+      "loss": 46.0,
+      "step": 359
+    },
+    {
+      "epoch": 0.06075180356916846,
+      "grad_norm": 0.0011958472896367311,
+      "learning_rate": 0.00017249134068682487,
+      "loss": 46.0,
+      "step": 360
+    },
+    {
+      "epoch": 0.060920558579082816,
+      "grad_norm": 0.0006783442222513258,
+      "learning_rate": 0.00017234161537793913,
+      "loss": 46.0,
+      "step": 361
+    },
+    {
+      "epoch": 0.06108931358899718,
+      "grad_norm": 0.0010664722649380565,
+      "learning_rate": 0.0001721915490719304,
+      "loss": 46.0,
+      "step": 362
+    },
+    {
+      "epoch": 0.06125806859891153,
+      "grad_norm": 0.0012871964136138558,
+      "learning_rate": 0.00017204114247616715,
+      "loss": 46.0,
+      "step": 363
+    },
+    {
+      "epoch": 0.061426823608825884,
+      "grad_norm": 0.0006326769944280386,
+      "learning_rate": 0.00017189039629962193,
+      "loss": 46.0,
+      "step": 364
+    },
+    {
+      "epoch": 0.061595578618740245,
+      "grad_norm": 0.0005951938219368458,
+      "learning_rate": 0.00017173931125286792,
+      "loss": 46.0,
+      "step": 365
+    },
+    {
+      "epoch": 0.061595578618740245,
+      "eval_loss": 11.5,
+      "eval_runtime": 14.8849,
+      "eval_samples_per_second": 167.619,
+      "eval_steps_per_second": 83.843,
+      "step": 365
+    }
+  ],
+  "logging_steps": 1,
+  "max_steps": 1457,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 1,
+  "save_steps": 365,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 7271850639360.0,
+  "train_batch_size": 2,
+  "trial_name": null,
+  "trial_params": null
+}

last-checkpoint/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:39f2c53575314be409a7340b1ffac25d9f3a8a9e4813e050819956978b358f81
+size 6776

last-checkpoint/vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff