Training in progress, step 375, checkpoint

Browse files

Files changed (13) hide show

last-checkpoint/README.md +202 -0
last-checkpoint/adapter_config.json +36 -0
last-checkpoint/adapter_model.safetensors +3 -0
last-checkpoint/merges.txt +0 -0
last-checkpoint/optimizer.pt +3 -0
last-checkpoint/rng_state.pth +3 -0
last-checkpoint/scheduler.pt +3 -0
last-checkpoint/special_tokens_map.json +30 -0
last-checkpoint/tokenizer.json +0 -0
last-checkpoint/tokenizer_config.json +23 -0
last-checkpoint/trainer_state.json +2666 -0
last-checkpoint/training_args.bin +3 -0
last-checkpoint/vocab.json +0 -0

last-checkpoint/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: EleutherAI/gpt-neo-125m
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.14.0

last-checkpoint/adapter_config.json ADDED Viewed

	@@ -0,0 +1,36 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "EleutherAI/gpt-neo-125m",
+  "bias": "none",
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": null,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_bias": false,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "v_proj",
+    "q_proj",
+    "c_fc",
+    "out_proj",
+    "k_proj",
+    "c_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

last-checkpoint/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b540a97127573040eb13b3e05008a286d1ccf3b6dd510f9592bd5bca56e8abca
+size 5327496

last-checkpoint/merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

last-checkpoint/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3584364385ee92d69d793759412266f118b58a4c6c21b7df6c8e77df372d6147
+size 2857850

last-checkpoint/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6a7b7eb2fe4472278ab50b6b7d479ddcfad70595ae73db4f92f3f6acbfe9eea0
+size 14244

last-checkpoint/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4bf304e342001350c82d6970cec50fb92a4329a84dcb76ae8031bca03ca92aa9
+size 1064

last-checkpoint/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,30 @@

+{
+  "bos_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  }
+}

last-checkpoint/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

last-checkpoint/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,23 @@

+{
+  "add_bos_token": false,
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "50256": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<|endoftext|>",
+  "chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}{% endif %}",
+  "clean_up_tokenization_spaces": true,
+  "eos_token": "<|endoftext|>",
+  "errors": "replace",
+  "model_max_length": 2048,
+  "pad_token": "<|endoftext|>",
+  "tokenizer_class": "GPT2Tokenizer",
+  "unk_token": "<|endoftext|>"
+}

last-checkpoint/trainer_state.json ADDED Viewed

	@@ -0,0 +1,2666 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 0.22661177625863957,
+  "eval_steps": 375,
+  "global_step": 375,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.0006042980700230389,
+      "grad_norm": 0.6723806858062744,
+      "learning_rate": 2e-05,
+      "loss": 0.7525,
+      "step": 1
+    },
+    {
+      "epoch": 0.0012085961400460777,
+      "grad_norm": 0.6356388926506042,
+      "learning_rate": 4e-05,
+      "loss": 1.4247,
+      "step": 2
+    },
+    {
+      "epoch": 0.0018128942100691166,
+      "grad_norm": 0.8352575898170471,
+      "learning_rate": 6e-05,
+      "loss": 2.0536,
+      "step": 3
+    },
+    {
+      "epoch": 0.0024171922800921555,
+      "grad_norm": 0.8536075353622437,
+      "learning_rate": 8e-05,
+      "loss": 2.207,
+      "step": 4
+    },
+    {
+      "epoch": 0.003021490350115194,
+      "grad_norm": 0.8931390047073364,
+      "learning_rate": 0.0001,
+      "loss": 2.3663,
+      "step": 5
+    },
+    {
+      "epoch": 0.0036257884201382332,
+      "grad_norm": 0.9978416562080383,
+      "learning_rate": 0.00012,
+      "loss": 2.3864,
+      "step": 6
+    },
+    {
+      "epoch": 0.004230086490161272,
+      "grad_norm": 0.9493173956871033,
+      "learning_rate": 0.00014,
+      "loss": 2.2914,
+      "step": 7
+    },
+    {
+      "epoch": 0.004834384560184311,
+      "grad_norm": 0.9359995722770691,
+      "learning_rate": 0.00016,
+      "loss": 2.3039,
+      "step": 8
+    },
+    {
+      "epoch": 0.00543868263020735,
+      "grad_norm": 1.0341345071792603,
+      "learning_rate": 0.00018,
+      "loss": 2.4413,
+      "step": 9
+    },
+    {
+      "epoch": 0.006042980700230388,
+      "grad_norm": 1.0705313682556152,
+      "learning_rate": 0.0002,
+      "loss": 2.4622,
+      "step": 10
+    },
+    {
+      "epoch": 0.006647278770253427,
+      "grad_norm": 1.157464861869812,
+      "learning_rate": 0.00019999977772170748,
+      "loss": 2.338,
+      "step": 11
+    },
+    {
+      "epoch": 0.0072515768402764665,
+      "grad_norm": 1.0905333757400513,
+      "learning_rate": 0.00019999911088781805,
+      "loss": 2.2185,
+      "step": 12
+    },
+    {
+      "epoch": 0.007855874910299505,
+      "grad_norm": 1.2449381351470947,
+      "learning_rate": 0.0001999979995012962,
+      "loss": 2.1223,
+      "step": 13
+    },
+    {
+      "epoch": 0.008460172980322544,
+      "grad_norm": 1.2261898517608643,
+      "learning_rate": 0.00019999644356708261,
+      "loss": 2.1547,
+      "step": 14
+    },
+    {
+      "epoch": 0.009064471050345583,
+      "grad_norm": 1.2600151300430298,
+      "learning_rate": 0.00019999444309209432,
+      "loss": 2.1783,
+      "step": 15
+    },
+    {
+      "epoch": 0.009668769120368622,
+      "grad_norm": 1.3068581819534302,
+      "learning_rate": 0.0001999919980852246,
+      "loss": 2.1884,
+      "step": 16
+    },
+    {
+      "epoch": 0.010273067190391661,
+      "grad_norm": 1.5525509119033813,
+      "learning_rate": 0.00019998910855734288,
+      "loss": 2.1489,
+      "step": 17
+    },
+    {
+      "epoch": 0.0108773652604147,
+      "grad_norm": 1.5854105949401855,
+      "learning_rate": 0.0001999857745212947,
+      "loss": 2.0207,
+      "step": 18
+    },
+    {
+      "epoch": 0.01148166333043774,
+      "grad_norm": 1.6069377660751343,
+      "learning_rate": 0.00019998199599190178,
+      "loss": 2.1773,
+      "step": 19
+    },
+    {
+      "epoch": 0.012085961400460777,
+      "grad_norm": 1.8314810991287231,
+      "learning_rate": 0.0001999777729859618,
+      "loss": 2.1165,
+      "step": 20
+    },
+    {
+      "epoch": 0.012690259470483816,
+      "grad_norm": 1.5624139308929443,
+      "learning_rate": 0.00019997310552224846,
+      "loss": 2.0546,
+      "step": 21
+    },
+    {
+      "epoch": 0.013294557540506855,
+      "grad_norm": 1.526838779449463,
+      "learning_rate": 0.00019996799362151122,
+      "loss": 2.17,
+      "step": 22
+    },
+    {
+      "epoch": 0.013898855610529894,
+      "grad_norm": 2.1960701942443848,
+      "learning_rate": 0.00019996243730647538,
+      "loss": 2.2203,
+      "step": 23
+    },
+    {
+      "epoch": 0.014503153680552933,
+      "grad_norm": 2.307544708251953,
+      "learning_rate": 0.00019995643660184191,
+      "loss": 2.1815,
+      "step": 24
+    },
+    {
+      "epoch": 0.015107451750575972,
+      "grad_norm": 2.294649362564087,
+      "learning_rate": 0.00019994999153428737,
+      "loss": 2.2256,
+      "step": 25
+    },
+    {
+      "epoch": 0.01571174982059901,
+      "grad_norm": 2.0167179107666016,
+      "learning_rate": 0.00019994310213246368,
+      "loss": 2.378,
+      "step": 26
+    },
+    {
+      "epoch": 0.01631604789062205,
+      "grad_norm": 1.8471157550811768,
+      "learning_rate": 0.00019993576842699816,
+      "loss": 2.2643,
+      "step": 27
+    },
+    {
+      "epoch": 0.016920345960645088,
+      "grad_norm": 2.103811502456665,
+      "learning_rate": 0.0001999279904504933,
+      "loss": 2.4134,
+      "step": 28
+    },
+    {
+      "epoch": 0.017524644030668127,
+      "grad_norm": 2.363506555557251,
+      "learning_rate": 0.00019991976823752653,
+      "loss": 2.3607,
+      "step": 29
+    },
+    {
+      "epoch": 0.018128942100691166,
+      "grad_norm": 2.0845320224761963,
+      "learning_rate": 0.00019991110182465032,
+      "loss": 2.4767,
+      "step": 30
+    },
+    {
+      "epoch": 0.018733240170714205,
+      "grad_norm": 2.4435393810272217,
+      "learning_rate": 0.00019990199125039174,
+      "loss": 2.3495,
+      "step": 31
+    },
+    {
+      "epoch": 0.019337538240737244,
+      "grad_norm": 2.0392420291900635,
+      "learning_rate": 0.00019989243655525247,
+      "loss": 2.3409,
+      "step": 32
+    },
+    {
+      "epoch": 0.019941836310760283,
+      "grad_norm": 2.0392611026763916,
+      "learning_rate": 0.00019988243778170853,
+      "loss": 2.2221,
+      "step": 33
+    },
+    {
+      "epoch": 0.020546134380783322,
+      "grad_norm": 2.322539806365967,
+      "learning_rate": 0.0001998719949742101,
+      "loss": 2.4324,
+      "step": 34
+    },
+    {
+      "epoch": 0.02115043245080636,
+      "grad_norm": 2.2848479747772217,
+      "learning_rate": 0.0001998611081791814,
+      "loss": 2.411,
+      "step": 35
+    },
+    {
+      "epoch": 0.0217547305208294,
+      "grad_norm": 2.3438031673431396,
+      "learning_rate": 0.00019984977744502038,
+      "loss": 2.3022,
+      "step": 36
+    },
+    {
+      "epoch": 0.02235902859085244,
+      "grad_norm": 2.3946008682250977,
+      "learning_rate": 0.00019983800282209857,
+      "loss": 2.4329,
+      "step": 37
+    },
+    {
+      "epoch": 0.02296332666087548,
+      "grad_norm": 2.265860080718994,
+      "learning_rate": 0.00019982578436276082,
+      "loss": 2.2641,
+      "step": 38
+    },
+    {
+      "epoch": 0.023567624730898514,
+      "grad_norm": 3.028747320175171,
+      "learning_rate": 0.00019981312212132512,
+      "loss": 2.2929,
+      "step": 39
+    },
+    {
+      "epoch": 0.024171922800921553,
+      "grad_norm": 2.5112571716308594,
+      "learning_rate": 0.00019980001615408228,
+      "loss": 2.4233,
+      "step": 40
+    },
+    {
+      "epoch": 0.024776220870944592,
+      "grad_norm": 2.5553221702575684,
+      "learning_rate": 0.00019978646651929572,
+      "loss": 2.5443,
+      "step": 41
+    },
+    {
+      "epoch": 0.02538051894096763,
+      "grad_norm": 2.7476682662963867,
+      "learning_rate": 0.00019977247327720128,
+      "loss": 2.4841,
+      "step": 42
+    },
+    {
+      "epoch": 0.02598481701099067,
+      "grad_norm": 2.6699960231781006,
+      "learning_rate": 0.0001997580364900068,
+      "loss": 2.5442,
+      "step": 43
+    },
+    {
+      "epoch": 0.02658911508101371,
+      "grad_norm": 2.905996561050415,
+      "learning_rate": 0.000199743156221892,
+      "loss": 2.7954,
+      "step": 44
+    },
+    {
+      "epoch": 0.02719341315103675,
+      "grad_norm": 3.1173646450042725,
+      "learning_rate": 0.00019972783253900808,
+      "loss": 2.5903,
+      "step": 45
+    },
+    {
+      "epoch": 0.027797711221059788,
+      "grad_norm": 3.142016887664795,
+      "learning_rate": 0.00019971206550947748,
+      "loss": 2.7681,
+      "step": 46
+    },
+    {
+      "epoch": 0.028402009291082827,
+      "grad_norm": 3.2653255462646484,
+      "learning_rate": 0.00019969585520339354,
+      "loss": 2.8531,
+      "step": 47
+    },
+    {
+      "epoch": 0.029006307361105866,
+      "grad_norm": 4.3081583976745605,
+      "learning_rate": 0.0001996792016928203,
+      "loss": 3.074,
+      "step": 48
+    },
+    {
+      "epoch": 0.029610605431128905,
+      "grad_norm": 7.22359037399292,
+      "learning_rate": 0.00019966210505179197,
+      "loss": 3.7639,
+      "step": 49
+    },
+    {
+      "epoch": 0.030214903501151944,
+      "grad_norm": 15.731882095336914,
+      "learning_rate": 0.00019964456535631286,
+      "loss": 4.334,
+      "step": 50
+    },
+    {
+      "epoch": 0.030819201571174983,
+      "grad_norm": 2.968768835067749,
+      "learning_rate": 0.0001996265826843568,
+      "loss": 0.7182,
+      "step": 51
+    },
+    {
+      "epoch": 0.03142349964119802,
+      "grad_norm": 3.0781517028808594,
+      "learning_rate": 0.00019960815711586696,
+      "loss": 1.2886,
+      "step": 52
+    },
+    {
+      "epoch": 0.03202779771122106,
+      "grad_norm": 3.273003101348877,
+      "learning_rate": 0.00019958928873275539,
+      "loss": 1.9765,
+      "step": 53
+    },
+    {
+      "epoch": 0.0326320957812441,
+      "grad_norm": 2.7970402240753174,
+      "learning_rate": 0.00019956997761890277,
+      "loss": 1.9967,
+      "step": 54
+    },
+    {
+      "epoch": 0.033236393851267136,
+      "grad_norm": 2.716160774230957,
+      "learning_rate": 0.00019955022386015792,
+      "loss": 2.1967,
+      "step": 55
+    },
+    {
+      "epoch": 0.033840691921290175,
+      "grad_norm": 2.9746975898742676,
+      "learning_rate": 0.00019953002754433743,
+      "loss": 2.313,
+      "step": 56
+    },
+    {
+      "epoch": 0.034444989991313214,
+      "grad_norm": 2.6240665912628174,
+      "learning_rate": 0.00019950938876122542,
+      "loss": 2.3107,
+      "step": 57
+    },
+    {
+      "epoch": 0.03504928806133625,
+      "grad_norm": 2.2789065837860107,
+      "learning_rate": 0.00019948830760257291,
+      "loss": 2.1248,
+      "step": 58
+    },
+    {
+      "epoch": 0.03565358613135929,
+      "grad_norm": 2.2027852535247803,
+      "learning_rate": 0.0001994667841620976,
+      "loss": 2.3983,
+      "step": 59
+    },
+    {
+      "epoch": 0.03625788420138233,
+      "grad_norm": 2.5889904499053955,
+      "learning_rate": 0.00019944481853548335,
+      "loss": 2.3289,
+      "step": 60
+    },
+    {
+      "epoch": 0.03686218227140537,
+      "grad_norm": 2.0383386611938477,
+      "learning_rate": 0.00019942241082037982,
+      "loss": 2.4328,
+      "step": 61
+    },
+    {
+      "epoch": 0.03746648034142841,
+      "grad_norm": 1.9245140552520752,
+      "learning_rate": 0.00019939956111640197,
+      "loss": 2.25,
+      "step": 62
+    },
+    {
+      "epoch": 0.03807077841145145,
+      "grad_norm": 2.250455141067505,
+      "learning_rate": 0.00019937626952512964,
+      "loss": 2.4095,
+      "step": 63
+    },
+    {
+      "epoch": 0.03867507648147449,
+      "grad_norm": 1.8156126737594604,
+      "learning_rate": 0.0001993525361501072,
+      "loss": 2.1022,
+      "step": 64
+    },
+    {
+      "epoch": 0.03927937455149753,
+      "grad_norm": 2.0205323696136475,
+      "learning_rate": 0.00019932836109684286,
+      "loss": 2.1322,
+      "step": 65
+    },
+    {
+      "epoch": 0.039883672621520566,
+      "grad_norm": 1.8734694719314575,
+      "learning_rate": 0.00019930374447280845,
+      "loss": 1.9571,
+      "step": 66
+    },
+    {
+      "epoch": 0.040487970691543605,
+      "grad_norm": 2.0151290893554688,
+      "learning_rate": 0.00019927868638743875,
+      "loss": 1.9123,
+      "step": 67
+    },
+    {
+      "epoch": 0.041092268761566644,
+      "grad_norm": 2.1093060970306396,
+      "learning_rate": 0.0001992531869521312,
+      "loss": 2.0122,
+      "step": 68
+    },
+    {
+      "epoch": 0.04169656683158968,
+      "grad_norm": 2.0080783367156982,
+      "learning_rate": 0.00019922724628024515,
+      "loss": 2.013,
+      "step": 69
+    },
+    {
+      "epoch": 0.04230086490161272,
+      "grad_norm": 2.097050428390503,
+      "learning_rate": 0.0001992008644871016,
+      "loss": 2.0439,
+      "step": 70
+    },
+    {
+      "epoch": 0.04290516297163576,
+      "grad_norm": 1.9204063415527344,
+      "learning_rate": 0.00019917404168998256,
+      "loss": 1.8985,
+      "step": 71
+    },
+    {
+      "epoch": 0.0435094610416588,
+      "grad_norm": 1.9057285785675049,
+      "learning_rate": 0.0001991467780081305,
+      "loss": 2.0124,
+      "step": 72
+    },
+    {
+      "epoch": 0.04411375911168184,
+      "grad_norm": 1.9287506341934204,
+      "learning_rate": 0.00019911907356274795,
+      "loss": 2.1344,
+      "step": 73
+    },
+    {
+      "epoch": 0.04471805718170488,
+      "grad_norm": 1.8858532905578613,
+      "learning_rate": 0.00019909092847699683,
+      "loss": 2.1661,
+      "step": 74
+    },
+    {
+      "epoch": 0.04532235525172792,
+      "grad_norm": 2.1366336345672607,
+      "learning_rate": 0.00019906234287599798,
+      "loss": 2.2085,
+      "step": 75
+    },
+    {
+      "epoch": 0.04592665332175096,
+      "grad_norm": 1.9908347129821777,
+      "learning_rate": 0.00019903331688683057,
+      "loss": 2.1772,
+      "step": 76
+    },
+    {
+      "epoch": 0.04653095139177399,
+      "grad_norm": 2.1487462520599365,
+      "learning_rate": 0.00019900385063853154,
+      "loss": 2.2814,
+      "step": 77
+    },
+    {
+      "epoch": 0.04713524946179703,
+      "grad_norm": 2.083444595336914,
+      "learning_rate": 0.00019897394426209505,
+      "loss": 2.2921,
+      "step": 78
+    },
+    {
+      "epoch": 0.04773954753182007,
+      "grad_norm": 2.356961250305176,
+      "learning_rate": 0.00019894359789047187,
+      "loss": 2.3103,
+      "step": 79
+    },
+    {
+      "epoch": 0.048343845601843106,
+      "grad_norm": 2.109005928039551,
+      "learning_rate": 0.00019891281165856873,
+      "loss": 2.2682,
+      "step": 80
+    },
+    {
+      "epoch": 0.048948143671866146,
+      "grad_norm": 2.2167046070098877,
+      "learning_rate": 0.00019888158570324795,
+      "loss": 2.3838,
+      "step": 81
+    },
+    {
+      "epoch": 0.049552441741889185,
+      "grad_norm": 2.2872414588928223,
+      "learning_rate": 0.0001988499201633265,
+      "loss": 2.2507,
+      "step": 82
+    },
+    {
+      "epoch": 0.050156739811912224,
+      "grad_norm": 2.393831253051758,
+      "learning_rate": 0.00019881781517957562,
+      "loss": 2.1679,
+      "step": 83
+    },
+    {
+      "epoch": 0.05076103788193526,
+      "grad_norm": 2.3482563495635986,
+      "learning_rate": 0.0001987852708947202,
+      "loss": 2.1021,
+      "step": 84
+    },
+    {
+      "epoch": 0.0513653359519583,
+      "grad_norm": 2.3528244495391846,
+      "learning_rate": 0.00019875228745343794,
+      "loss": 2.3149,
+      "step": 85
+    },
+    {
+      "epoch": 0.05196963402198134,
+      "grad_norm": 2.345458745956421,
+      "learning_rate": 0.0001987188650023589,
+      "loss": 2.2611,
+      "step": 86
+    },
+    {
+      "epoch": 0.05257393209200438,
+      "grad_norm": 2.626271963119507,
+      "learning_rate": 0.0001986850036900648,
+      "loss": 2.2347,
+      "step": 87
+    },
+    {
+      "epoch": 0.05317823016202742,
+      "grad_norm": 3.3226706981658936,
+      "learning_rate": 0.00019865070366708836,
+      "loss": 2.2486,
+      "step": 88
+    },
+    {
+      "epoch": 0.05378252823205046,
+      "grad_norm": 2.9476661682128906,
+      "learning_rate": 0.00019861596508591255,
+      "loss": 2.3295,
+      "step": 89
+    },
+    {
+      "epoch": 0.0543868263020735,
+      "grad_norm": 2.7701926231384277,
+      "learning_rate": 0.00019858078810097002,
+      "loss": 2.3878,
+      "step": 90
+    },
+    {
+      "epoch": 0.054991124372096536,
+      "grad_norm": 2.8630998134613037,
+      "learning_rate": 0.00019854517286864245,
+      "loss": 2.3634,
+      "step": 91
+    },
+    {
+      "epoch": 0.055595422442119576,
+      "grad_norm": 2.862698554992676,
+      "learning_rate": 0.0001985091195472596,
+      "loss": 2.4529,
+      "step": 92
+    },
+    {
+      "epoch": 0.056199720512142615,
+      "grad_norm": 2.695298433303833,
+      "learning_rate": 0.0001984726282970989,
+      "loss": 2.3837,
+      "step": 93
+    },
+    {
+      "epoch": 0.056804018582165654,
+      "grad_norm": 2.9406235218048096,
+      "learning_rate": 0.0001984356992803847,
+      "loss": 2.4949,
+      "step": 94
+    },
+    {
+      "epoch": 0.05740831665218869,
+      "grad_norm": 2.9532742500305176,
+      "learning_rate": 0.00019839833266128724,
+      "loss": 2.4258,
+      "step": 95
+    },
+    {
+      "epoch": 0.05801261472221173,
+      "grad_norm": 3.499074935913086,
+      "learning_rate": 0.00019836052860592237,
+      "loss": 2.6488,
+      "step": 96
+    },
+    {
+      "epoch": 0.05861691279223477,
+      "grad_norm": 4.071224212646484,
+      "learning_rate": 0.0001983222872823505,
+      "loss": 2.6862,
+      "step": 97
+    },
+    {
+      "epoch": 0.05922121086225781,
+      "grad_norm": 4.09813928604126,
+      "learning_rate": 0.00019828360886057594,
+      "loss": 2.8312,
+      "step": 98
+    },
+    {
+      "epoch": 0.05982550893228085,
+      "grad_norm": 6.380131721496582,
+      "learning_rate": 0.00019824449351254616,
+      "loss": 3.0643,
+      "step": 99
+    },
+    {
+      "epoch": 0.06042980700230389,
+      "grad_norm": 17.392900466918945,
+      "learning_rate": 0.00019820494141215104,
+      "loss": 4.2204,
+      "step": 100
+    },
+    {
+      "epoch": 0.06103410507232693,
+      "grad_norm": 2.4286563396453857,
+      "learning_rate": 0.000198164952735222,
+      "loss": 0.687,
+      "step": 101
+    },
+    {
+      "epoch": 0.061638403142349966,
+      "grad_norm": 2.4470341205596924,
+      "learning_rate": 0.00019812452765953135,
+      "loss": 1.4129,
+      "step": 102
+    },
+    {
+      "epoch": 0.062242701212373006,
+      "grad_norm": 2.4300434589385986,
+      "learning_rate": 0.00019808366636479147,
+      "loss": 1.8864,
+      "step": 103
+    },
+    {
+      "epoch": 0.06284699928239604,
+      "grad_norm": 2.781144380569458,
+      "learning_rate": 0.00019804236903265388,
+      "loss": 2.2268,
+      "step": 104
+    },
+    {
+      "epoch": 0.06345129735241908,
+      "grad_norm": 2.7623088359832764,
+      "learning_rate": 0.00019800063584670863,
+      "loss": 2.2182,
+      "step": 105
+    },
+    {
+      "epoch": 0.06405559542244212,
+      "grad_norm": 2.4676239490509033,
+      "learning_rate": 0.00019795846699248332,
+      "loss": 2.3688,
+      "step": 106
+    },
+    {
+      "epoch": 0.06465989349246516,
+      "grad_norm": 2.56716251373291,
+      "learning_rate": 0.00019791586265744237,
+      "loss": 2.3069,
+      "step": 107
+    },
+    {
+      "epoch": 0.0652641915624882,
+      "grad_norm": 2.625476837158203,
+      "learning_rate": 0.00019787282303098617,
+      "loss": 2.4915,
+      "step": 108
+    },
+    {
+      "epoch": 0.06586848963251124,
+      "grad_norm": 2.214947462081909,
+      "learning_rate": 0.0001978293483044502,
+      "loss": 2.4314,
+      "step": 109
+    },
+    {
+      "epoch": 0.06647278770253427,
+      "grad_norm": 2.3063435554504395,
+      "learning_rate": 0.00019778543867110426,
+      "loss": 2.3165,
+      "step": 110
+    },
+    {
+      "epoch": 0.06707708577255732,
+      "grad_norm": 2.1222662925720215,
+      "learning_rate": 0.00019774109432615147,
+      "loss": 2.3531,
+      "step": 111
+    },
+    {
+      "epoch": 0.06768138384258035,
+      "grad_norm": 2.2197558879852295,
+      "learning_rate": 0.00019769631546672756,
+      "loss": 2.2317,
+      "step": 112
+    },
+    {
+      "epoch": 0.0682856819126034,
+      "grad_norm": 2.209230661392212,
+      "learning_rate": 0.00019765110229189988,
+      "loss": 2.108,
+      "step": 113
+    },
+    {
+      "epoch": 0.06888997998262643,
+      "grad_norm": 2.125019073486328,
+      "learning_rate": 0.00019760545500266657,
+      "loss": 1.9513,
+      "step": 114
+    },
+    {
+      "epoch": 0.06949427805264947,
+      "grad_norm": 2.7421507835388184,
+      "learning_rate": 0.00019755937380195568,
+      "loss": 2.0994,
+      "step": 115
+    },
+    {
+      "epoch": 0.0700985761226725,
+      "grad_norm": 2.6599299907684326,
+      "learning_rate": 0.00019751285889462423,
+      "loss": 1.9108,
+      "step": 116
+    },
+    {
+      "epoch": 0.07070287419269555,
+      "grad_norm": 2.2113850116729736,
+      "learning_rate": 0.0001974659104874573,
+      "loss": 1.857,
+      "step": 117
+    },
+    {
+      "epoch": 0.07130717226271858,
+      "grad_norm": 2.392446279525757,
+      "learning_rate": 0.0001974185287891671,
+      "loss": 1.971,
+      "step": 118
+    },
+    {
+      "epoch": 0.07191147033274163,
+      "grad_norm": 2.2785661220550537,
+      "learning_rate": 0.0001973707140103921,
+      "loss": 1.9599,
+      "step": 119
+    },
+    {
+      "epoch": 0.07251576840276466,
+      "grad_norm": 2.4346420764923096,
+      "learning_rate": 0.00019732246636369605,
+      "loss": 2.0515,
+      "step": 120
+    },
+    {
+      "epoch": 0.07312006647278771,
+      "grad_norm": 2.22165846824646,
+      "learning_rate": 0.00019727378606356703,
+      "loss": 2.0886,
+      "step": 121
+    },
+    {
+      "epoch": 0.07372436454281074,
+      "grad_norm": 2.303600549697876,
+      "learning_rate": 0.00019722467332641656,
+      "loss": 2.0801,
+      "step": 122
+    },
+    {
+      "epoch": 0.07432866261283377,
+      "grad_norm": 2.389984130859375,
+      "learning_rate": 0.00019717512837057855,
+      "loss": 2.1252,
+      "step": 123
+    },
+    {
+      "epoch": 0.07493296068285682,
+      "grad_norm": 2.2050936222076416,
+      "learning_rate": 0.0001971251514163083,
+      "loss": 2.0506,
+      "step": 124
+    },
+    {
+      "epoch": 0.07553725875287985,
+      "grad_norm": 2.605313539505005,
+      "learning_rate": 0.0001970747426857817,
+      "loss": 2.1459,
+      "step": 125
+    },
+    {
+      "epoch": 0.0761415568229029,
+      "grad_norm": 2.3666372299194336,
+      "learning_rate": 0.00019702390240309404,
+      "loss": 2.217,
+      "step": 126
+    },
+    {
+      "epoch": 0.07674585489292593,
+      "grad_norm": 2.5370900630950928,
+      "learning_rate": 0.0001969726307942592,
+      "loss": 2.1515,
+      "step": 127
+    },
+    {
+      "epoch": 0.07735015296294898,
+      "grad_norm": 2.7379939556121826,
+      "learning_rate": 0.00019692092808720846,
+      "loss": 2.3117,
+      "step": 128
+    },
+    {
+      "epoch": 0.07795445103297201,
+      "grad_norm": 2.623459577560425,
+      "learning_rate": 0.0001968687945117896,
+      "loss": 2.2315,
+      "step": 129
+    },
+    {
+      "epoch": 0.07855874910299505,
+      "grad_norm": 2.609384775161743,
+      "learning_rate": 0.00019681623029976588,
+      "loss": 2.2411,
+      "step": 130
+    },
+    {
+      "epoch": 0.07916304717301809,
+      "grad_norm": 2.484173059463501,
+      "learning_rate": 0.00019676323568481498,
+      "loss": 2.0752,
+      "step": 131
+    },
+    {
+      "epoch": 0.07976734524304113,
+      "grad_norm": 2.5462472438812256,
+      "learning_rate": 0.00019670981090252792,
+      "loss": 2.2665,
+      "step": 132
+    },
+    {
+      "epoch": 0.08037164331306416,
+      "grad_norm": 2.5074727535247803,
+      "learning_rate": 0.00019665595619040808,
+      "loss": 2.2347,
+      "step": 133
+    },
+    {
+      "epoch": 0.08097594138308721,
+      "grad_norm": 2.9072914123535156,
+      "learning_rate": 0.0001966016717878702,
+      "loss": 2.2369,
+      "step": 134
+    },
+    {
+      "epoch": 0.08158023945311024,
+      "grad_norm": 2.8787453174591064,
+      "learning_rate": 0.00019654695793623907,
+      "loss": 2.2034,
+      "step": 135
+    },
+    {
+      "epoch": 0.08218453752313329,
+      "grad_norm": 2.8111212253570557,
+      "learning_rate": 0.0001964918148787488,
+      "loss": 2.2064,
+      "step": 136
+    },
+    {
+      "epoch": 0.08278883559315632,
+      "grad_norm": 2.9196033477783203,
+      "learning_rate": 0.00019643624286054144,
+      "loss": 2.3142,
+      "step": 137
+    },
+    {
+      "epoch": 0.08339313366317937,
+      "grad_norm": 3.3923306465148926,
+      "learning_rate": 0.00019638024212866606,
+      "loss": 2.3254,
+      "step": 138
+    },
+    {
+      "epoch": 0.0839974317332024,
+      "grad_norm": 3.270350217819214,
+      "learning_rate": 0.0001963238129320776,
+      "loss": 2.3072,
+      "step": 139
+    },
+    {
+      "epoch": 0.08460172980322545,
+      "grad_norm": 3.227761745452881,
+      "learning_rate": 0.00019626695552163578,
+      "loss": 2.441,
+      "step": 140
+    },
+    {
+      "epoch": 0.08520602787324848,
+      "grad_norm": 3.1120471954345703,
+      "learning_rate": 0.00019620967015010395,
+      "loss": 2.5266,
+      "step": 141
+    },
+    {
+      "epoch": 0.08581032594327152,
+      "grad_norm": 3.56309175491333,
+      "learning_rate": 0.00019615195707214803,
+      "loss": 2.1229,
+      "step": 142
+    },
+    {
+      "epoch": 0.08641462401329456,
+      "grad_norm": 3.308427333831787,
+      "learning_rate": 0.0001960938165443353,
+      "loss": 2.3412,
+      "step": 143
+    },
+    {
+      "epoch": 0.0870189220833176,
+      "grad_norm": 4.139370441436768,
+      "learning_rate": 0.00019603524882513327,
+      "loss": 2.6715,
+      "step": 144
+    },
+    {
+      "epoch": 0.08762322015334063,
+      "grad_norm": 4.013917922973633,
+      "learning_rate": 0.0001959762541749086,
+      "loss": 2.462,
+      "step": 145
+    },
+    {
+      "epoch": 0.08822751822336368,
+      "grad_norm": 4.387619495391846,
+      "learning_rate": 0.00019591683285592593,
+      "loss": 2.7622,
+      "step": 146
+    },
+    {
+      "epoch": 0.08883181629338671,
+      "grad_norm": 4.593787670135498,
+      "learning_rate": 0.00019585698513234663,
+      "loss": 2.8924,
+      "step": 147
+    },
+    {
+      "epoch": 0.08943611436340976,
+      "grad_norm": 5.254031181335449,
+      "learning_rate": 0.0001957967112702277,
+      "loss": 2.8513,
+      "step": 148
+    },
+    {
+      "epoch": 0.09004041243343279,
+      "grad_norm": 9.610785484313965,
+      "learning_rate": 0.00019573601153752052,
+      "loss": 3.7939,
+      "step": 149
+    },
+    {
+      "epoch": 0.09064471050345584,
+      "grad_norm": 18.917434692382812,
+      "learning_rate": 0.00019567488620406983,
+      "loss": 4.2185,
+      "step": 150
+    },
+    {
+      "epoch": 0.09124900857347887,
+      "grad_norm": 2.803217649459839,
+      "learning_rate": 0.00019561333554161224,
+      "loss": 0.6506,
+      "step": 151
+    },
+    {
+      "epoch": 0.09185330664350191,
+      "grad_norm": 2.6787402629852295,
+      "learning_rate": 0.0001955513598237753,
+      "loss": 1.4542,
+      "step": 152
+    },
+    {
+      "epoch": 0.09245760471352495,
+      "grad_norm": 2.889517307281494,
+      "learning_rate": 0.00019548895932607621,
+      "loss": 1.9922,
+      "step": 153
+    },
+    {
+      "epoch": 0.09306190278354798,
+      "grad_norm": 2.9428203105926514,
+      "learning_rate": 0.00019542613432592038,
+      "loss": 2.1081,
+      "step": 154
+    },
+    {
+      "epoch": 0.09366620085357102,
+      "grad_norm": 2.6233069896698,
+      "learning_rate": 0.00019536288510260056,
+      "loss": 2.1672,
+      "step": 155
+    },
+    {
+      "epoch": 0.09427049892359406,
+      "grad_norm": 2.8075478076934814,
+      "learning_rate": 0.00019529921193729534,
+      "loss": 2.2364,
+      "step": 156
+    },
+    {
+      "epoch": 0.0948747969936171,
+      "grad_norm": 2.7134907245635986,
+      "learning_rate": 0.00019523511511306793,
+      "loss": 2.2153,
+      "step": 157
+    },
+    {
+      "epoch": 0.09547909506364013,
+      "grad_norm": 2.70220947265625,
+      "learning_rate": 0.000195170594914865,
+      "loss": 2.3958,
+      "step": 158
+    },
+    {
+      "epoch": 0.09608339313366318,
+      "grad_norm": 2.827350378036499,
+      "learning_rate": 0.00019510565162951537,
+      "loss": 2.4202,
+      "step": 159
+    },
+    {
+      "epoch": 0.09668769120368621,
+      "grad_norm": 2.6900534629821777,
+      "learning_rate": 0.00019504028554572864,
+      "loss": 2.3942,
+      "step": 160
+    },
+    {
+      "epoch": 0.09729198927370926,
+      "grad_norm": 2.4423742294311523,
+      "learning_rate": 0.00019497449695409408,
+      "loss": 2.2381,
+      "step": 161
+    },
+    {
+      "epoch": 0.09789628734373229,
+      "grad_norm": 2.530571222305298,
+      "learning_rate": 0.00019490828614707916,
+      "loss": 2.2482,
+      "step": 162
+    },
+    {
+      "epoch": 0.09850058541375534,
+      "grad_norm": 2.624455451965332,
+      "learning_rate": 0.00019484165341902845,
+      "loss": 2.1451,
+      "step": 163
+    },
+    {
+      "epoch": 0.09910488348377837,
+      "grad_norm": 2.657663583755493,
+      "learning_rate": 0.00019477459906616206,
+      "loss": 1.9959,
+      "step": 164
+    },
+    {
+      "epoch": 0.09970918155380142,
+      "grad_norm": 2.7544350624084473,
+      "learning_rate": 0.00019470712338657458,
+      "loss": 1.8559,
+      "step": 165
+    },
+    {
+      "epoch": 0.10031347962382445,
+      "grad_norm": 2.9291069507598877,
+      "learning_rate": 0.0001946392266802336,
+      "loss": 1.8975,
+      "step": 166
+    },
+    {
+      "epoch": 0.1009177776938475,
+      "grad_norm": 2.9229235649108887,
+      "learning_rate": 0.0001945709092489783,
+      "loss": 1.992,
+      "step": 167
+    },
+    {
+      "epoch": 0.10152207576387053,
+      "grad_norm": 2.5758590698242188,
+      "learning_rate": 0.00019450217139651844,
+      "loss": 1.8534,
+      "step": 168
+    },
+    {
+      "epoch": 0.10212637383389357,
+      "grad_norm": 2.4789786338806152,
+      "learning_rate": 0.0001944330134284326,
+      "loss": 2.0086,
+      "step": 169
+    },
+    {
+      "epoch": 0.1027306719039166,
+      "grad_norm": 2.6959614753723145,
+      "learning_rate": 0.00019436343565216711,
+      "loss": 2.0508,
+      "step": 170
+    },
+    {
+      "epoch": 0.10333496997393965,
+      "grad_norm": 2.539034128189087,
+      "learning_rate": 0.00019429343837703455,
+      "loss": 2.0941,
+      "step": 171
+    },
+    {
+      "epoch": 0.10393926804396268,
+      "grad_norm": 2.679861068725586,
+      "learning_rate": 0.0001942230219142124,
+      "loss": 1.9783,
+      "step": 172
+    },
+    {
+      "epoch": 0.10454356611398573,
+      "grad_norm": 2.8474485874176025,
+      "learning_rate": 0.0001941521865767417,
+      "loss": 1.9477,
+      "step": 173
+    },
+    {
+      "epoch": 0.10514786418400876,
+      "grad_norm": 2.6970903873443604,
+      "learning_rate": 0.0001940809326795256,
+      "loss": 1.9986,
+      "step": 174
+    },
+    {
+      "epoch": 0.1057521622540318,
+      "grad_norm": 2.604323387145996,
+      "learning_rate": 0.000194009260539328,
+      "loss": 2.15,
+      "step": 175
+    },
+    {
+      "epoch": 0.10635646032405484,
+      "grad_norm": 3.2425715923309326,
+      "learning_rate": 0.0001939371704747721,
+      "loss": 2.1733,
+      "step": 176
+    },
+    {
+      "epoch": 0.10696075839407788,
+      "grad_norm": 2.805544853210449,
+      "learning_rate": 0.00019386466280633906,
+      "loss": 2.1526,
+      "step": 177
+    },
+    {
+      "epoch": 0.10756505646410092,
+      "grad_norm": 2.6435155868530273,
+      "learning_rate": 0.00019379173785636646,
+      "loss": 2.2713,
+      "step": 178
+    },
+    {
+      "epoch": 0.10816935453412396,
+      "grad_norm": 3.0403223037719727,
+      "learning_rate": 0.000193718395949047,
+      "loss": 2.171,
+      "step": 179
+    },
+    {
+      "epoch": 0.108773652604147,
+      "grad_norm": 2.9794914722442627,
+      "learning_rate": 0.00019364463741042694,
+      "loss": 2.1918,
+      "step": 180
+    },
+    {
+      "epoch": 0.10937795067417004,
+      "grad_norm": 3.198793411254883,
+      "learning_rate": 0.00019357046256840473,
+      "loss": 2.3307,
+      "step": 181
+    },
+    {
+      "epoch": 0.10998224874419307,
+      "grad_norm": 3.435657262802124,
+      "learning_rate": 0.00019349587175272948,
+      "loss": 1.9757,
+      "step": 182
+    },
+    {
+      "epoch": 0.11058654681421612,
+      "grad_norm": 3.1969451904296875,
+      "learning_rate": 0.0001934208652949996,
+      "loss": 2.2,
+      "step": 183
+    },
+    {
+      "epoch": 0.11119084488423915,
+      "grad_norm": 3.012925863265991,
+      "learning_rate": 0.00019334544352866127,
+      "loss": 2.1743,
+      "step": 184
+    },
+    {
+      "epoch": 0.11179514295426218,
+      "grad_norm": 3.455808162689209,
+      "learning_rate": 0.00019326960678900688,
+      "loss": 2.0939,
+      "step": 185
+    },
+    {
+      "epoch": 0.11239944102428523,
+      "grad_norm": 3.4885973930358887,
+      "learning_rate": 0.00019319335541317361,
+      "loss": 2.0869,
+      "step": 186
+    },
+    {
+      "epoch": 0.11300373909430826,
+      "grad_norm": 3.6527769565582275,
+      "learning_rate": 0.00019311668974014208,
+      "loss": 2.1132,
+      "step": 187
+    },
+    {
+      "epoch": 0.11360803716433131,
+      "grad_norm": 3.5603349208831787,
+      "learning_rate": 0.00019303961011073447,
+      "loss": 2.495,
+      "step": 188
+    },
+    {
+      "epoch": 0.11421233523435434,
+      "grad_norm": 4.438934326171875,
+      "learning_rate": 0.00019296211686761346,
+      "loss": 2.2977,
+      "step": 189
+    },
+    {
+      "epoch": 0.11481663330437739,
+      "grad_norm": 3.6968555450439453,
+      "learning_rate": 0.00019288421035528028,
+      "loss": 2.2683,
+      "step": 190
+    },
+    {
+      "epoch": 0.11542093137440042,
+      "grad_norm": 4.076991081237793,
+      "learning_rate": 0.00019280589092007352,
+      "loss": 2.5081,
+      "step": 191
+    },
+    {
+      "epoch": 0.11602522944442346,
+      "grad_norm": 3.8638432025909424,
+      "learning_rate": 0.00019272715891016735,
+      "loss": 2.3628,
+      "step": 192
+    },
+    {
+      "epoch": 0.1166295275144465,
+      "grad_norm": 3.8374760150909424,
+      "learning_rate": 0.00019264801467557007,
+      "loss": 2.1119,
+      "step": 193
+    },
+    {
+      "epoch": 0.11723382558446954,
+      "grad_norm": 3.7434163093566895,
+      "learning_rate": 0.00019256845856812266,
+      "loss": 2.1354,
+      "step": 194
+    },
+    {
+      "epoch": 0.11783812365449257,
+      "grad_norm": 5.35501766204834,
+      "learning_rate": 0.000192488490941497,
+      "loss": 2.5971,
+      "step": 195
+    },
+    {
+      "epoch": 0.11844242172451562,
+      "grad_norm": 5.991562843322754,
+      "learning_rate": 0.00019240811215119448,
+      "loss": 2.8775,
+      "step": 196
+    },
+    {
+      "epoch": 0.11904671979453865,
+      "grad_norm": 5.207497596740723,
+      "learning_rate": 0.00019232732255454422,
+      "loss": 2.7139,
+      "step": 197
+    },
+    {
+      "epoch": 0.1196510178645617,
+      "grad_norm": 6.914007663726807,
+      "learning_rate": 0.00019224612251070175,
+      "loss": 2.8393,
+      "step": 198
+    },
+    {
+      "epoch": 0.12025531593458473,
+      "grad_norm": 14.335077285766602,
+      "learning_rate": 0.0001921645123806472,
+      "loss": 3.909,
+      "step": 199
+    },
+    {
+      "epoch": 0.12085961400460778,
+      "grad_norm": 22.205036163330078,
+      "learning_rate": 0.0001920824925271838,
+      "loss": 3.9603,
+      "step": 200
+    },
+    {
+      "epoch": 0.12146391207463081,
+      "grad_norm": 2.610422134399414,
+      "learning_rate": 0.0001920000633149362,
+      "loss": 0.6103,
+      "step": 201
+    },
+    {
+      "epoch": 0.12206821014465385,
+      "grad_norm": 2.129725217819214,
+      "learning_rate": 0.00019191722511034884,
+      "loss": 0.9237,
+      "step": 202
+    },
+    {
+      "epoch": 0.12267250821467689,
+      "grad_norm": 2.469688892364502,
+      "learning_rate": 0.00019183397828168448,
+      "loss": 1.882,
+      "step": 203
+    },
+    {
+      "epoch": 0.12327680628469993,
+      "grad_norm": 2.8455584049224854,
+      "learning_rate": 0.00019175032319902234,
+      "loss": 2.0722,
+      "step": 204
+    },
+    {
+      "epoch": 0.12388110435472297,
+      "grad_norm": 2.6948962211608887,
+      "learning_rate": 0.00019166626023425662,
+      "loss": 2.0877,
+      "step": 205
+    },
+    {
+      "epoch": 0.12448540242474601,
+      "grad_norm": 2.7360682487487793,
+      "learning_rate": 0.00019158178976109476,
+      "loss": 2.2041,
+      "step": 206
+    },
+    {
+      "epoch": 0.12508970049476906,
+      "grad_norm": 2.775461435317993,
+      "learning_rate": 0.0001914969121550558,
+      "loss": 2.2209,
+      "step": 207
+    },
+    {
+      "epoch": 0.12569399856479208,
+      "grad_norm": 2.772085428237915,
+      "learning_rate": 0.00019141162779346874,
+      "loss": 2.3135,
+      "step": 208
+    },
+    {
+      "epoch": 0.12629829663481512,
+      "grad_norm": 2.701401710510254,
+      "learning_rate": 0.00019132593705547082,
+      "loss": 2.354,
+      "step": 209
+    },
+    {
+      "epoch": 0.12690259470483817,
+      "grad_norm": 2.7918756008148193,
+      "learning_rate": 0.00019123984032200586,
+      "loss": 2.3682,
+      "step": 210
+    },
+    {
+      "epoch": 0.1275068927748612,
+      "grad_norm": 2.71183705329895,
+      "learning_rate": 0.00019115333797582254,
+      "loss": 2.3986,
+      "step": 211
+    },
+    {
+      "epoch": 0.12811119084488423,
+      "grad_norm": 2.859503984451294,
+      "learning_rate": 0.00019106643040147278,
+      "loss": 2.2909,
+      "step": 212
+    },
+    {
+      "epoch": 0.12871548891490728,
+      "grad_norm": 2.7157833576202393,
+      "learning_rate": 0.00019097911798530987,
+      "loss": 1.9269,
+      "step": 213
+    },
+    {
+      "epoch": 0.12931978698493032,
+      "grad_norm": 2.9616880416870117,
+      "learning_rate": 0.00019089140111548696,
+      "loss": 2.0798,
+      "step": 214
+    },
+    {
+      "epoch": 0.12992408505495334,
+      "grad_norm": 2.9561362266540527,
+      "learning_rate": 0.00019080328018195513,
+      "loss": 2.0269,
+      "step": 215
+    },
+    {
+      "epoch": 0.1305283831249764,
+      "grad_norm": 2.5822503566741943,
+      "learning_rate": 0.0001907147555764618,
+      "loss": 1.8083,
+      "step": 216
+    },
+    {
+      "epoch": 0.13113268119499943,
+      "grad_norm": 2.9241294860839844,
+      "learning_rate": 0.00019062582769254895,
+      "loss": 2.0313,
+      "step": 217
+    },
+    {
+      "epoch": 0.13173697926502248,
+      "grad_norm": 3.371105432510376,
+      "learning_rate": 0.00019053649692555135,
+      "loss": 1.8889,
+      "step": 218
+    },
+    {
+      "epoch": 0.1323412773350455,
+      "grad_norm": 2.92676043510437,
+      "learning_rate": 0.00019044676367259476,
+      "loss": 2.1494,
+      "step": 219
+    },
+    {
+      "epoch": 0.13294557540506854,
+      "grad_norm": 2.9326910972595215,
+      "learning_rate": 0.00019035662833259432,
+      "loss": 1.8016,
+      "step": 220
+    },
+    {
+      "epoch": 0.1335498734750916,
+      "grad_norm": 2.7013325691223145,
+      "learning_rate": 0.00019026609130625257,
+      "loss": 1.9979,
+      "step": 221
+    },
+    {
+      "epoch": 0.13415417154511464,
+      "grad_norm": 2.9696829319000244,
+      "learning_rate": 0.00019017515299605788,
+      "loss": 2.0573,
+      "step": 222
+    },
+    {
+      "epoch": 0.13475846961513765,
+      "grad_norm": 3.2566845417022705,
+      "learning_rate": 0.00019008381380628247,
+      "loss": 2.0659,
+      "step": 223
+    },
+    {
+      "epoch": 0.1353627676851607,
+      "grad_norm": 3.4677469730377197,
+      "learning_rate": 0.00018999207414298067,
+      "loss": 2.123,
+      "step": 224
+    },
+    {
+      "epoch": 0.13596706575518375,
+      "grad_norm": 2.9905354976654053,
+      "learning_rate": 0.00018989993441398726,
+      "loss": 2.1558,
+      "step": 225
+    },
+    {
+      "epoch": 0.1365713638252068,
+      "grad_norm": 3.78269362449646,
+      "learning_rate": 0.00018980739502891546,
+      "loss": 2.2024,
+      "step": 226
+    },
+    {
+      "epoch": 0.1371756618952298,
+      "grad_norm": 3.0096099376678467,
+      "learning_rate": 0.0001897144563991552,
+      "loss": 2.0295,
+      "step": 227
+    },
+    {
+      "epoch": 0.13777995996525286,
+      "grad_norm": 2.9910919666290283,
+      "learning_rate": 0.00018962111893787128,
+      "loss": 2.0755,
+      "step": 228
+    },
+    {
+      "epoch": 0.1383842580352759,
+      "grad_norm": 3.3000521659851074,
+      "learning_rate": 0.00018952738306000151,
+      "loss": 2.1896,
+      "step": 229
+    },
+    {
+      "epoch": 0.13898855610529895,
+      "grad_norm": 3.4028759002685547,
+      "learning_rate": 0.00018943324918225494,
+      "loss": 2.1814,
+      "step": 230
+    },
+    {
+      "epoch": 0.13959285417532197,
+      "grad_norm": 3.4083681106567383,
+      "learning_rate": 0.0001893387177231099,
+      "loss": 2.1202,
+      "step": 231
+    },
+    {
+      "epoch": 0.140197152245345,
+      "grad_norm": 3.344849109649658,
+      "learning_rate": 0.0001892437891028122,
+      "loss": 2.2417,
+      "step": 232
+    },
+    {
+      "epoch": 0.14080145031536806,
+      "grad_norm": 3.3460681438446045,
+      "learning_rate": 0.0001891484637433733,
+      "loss": 2.2566,
+      "step": 233
+    },
+    {
+      "epoch": 0.1414057483853911,
+      "grad_norm": 3.230792284011841,
+      "learning_rate": 0.00018905274206856837,
+      "loss": 2.1327,
+      "step": 234
+    },
+    {
+      "epoch": 0.14201004645541412,
+      "grad_norm": 3.6352362632751465,
+      "learning_rate": 0.00018895662450393438,
+      "loss": 2.2026,
+      "step": 235
+    },
+    {
+      "epoch": 0.14261434452543717,
+      "grad_norm": 3.4648375511169434,
+      "learning_rate": 0.00018886011147676833,
+      "loss": 2.2437,
+      "step": 236
+    },
+    {
+      "epoch": 0.14321864259546022,
+      "grad_norm": 3.5209550857543945,
+      "learning_rate": 0.00018876320341612522,
+      "loss": 2.3193,
+      "step": 237
+    },
+    {
+      "epoch": 0.14382294066548326,
+      "grad_norm": 3.629462242126465,
+      "learning_rate": 0.00018866590075281624,
+      "loss": 2.2131,
+      "step": 238
+    },
+    {
+      "epoch": 0.14442723873550628,
+      "grad_norm": 3.810232400894165,
+      "learning_rate": 0.00018856820391940674,
+      "loss": 2.2062,
+      "step": 239
+    },
+    {
+      "epoch": 0.14503153680552933,
+      "grad_norm": 4.461780071258545,
+      "learning_rate": 0.00018847011335021449,
+      "loss": 2.336,
+      "step": 240
+    },
+    {
+      "epoch": 0.14563583487555237,
+      "grad_norm": 3.840716600418091,
+      "learning_rate": 0.00018837162948130752,
+      "loss": 2.1435,
+      "step": 241
+    },
+    {
+      "epoch": 0.14624013294557542,
+      "grad_norm": 4.086057186126709,
+      "learning_rate": 0.00018827275275050233,
+      "loss": 2.2668,
+      "step": 242
+    },
+    {
+      "epoch": 0.14684443101559844,
+      "grad_norm": 4.305682182312012,
+      "learning_rate": 0.00018817348359736203,
+      "loss": 2.5264,
+      "step": 243
+    },
+    {
+      "epoch": 0.14744872908562148,
+      "grad_norm": 4.541891098022461,
+      "learning_rate": 0.00018807382246319412,
+      "loss": 2.4166,
+      "step": 244
+    },
+    {
+      "epoch": 0.14805302715564453,
+      "grad_norm": 4.768816947937012,
+      "learning_rate": 0.00018797376979104872,
+      "loss": 2.6156,
+      "step": 245
+    },
+    {
+      "epoch": 0.14865732522566755,
+      "grad_norm": 6.231074333190918,
+      "learning_rate": 0.00018787332602571662,
+      "loss": 2.5208,
+      "step": 246
+    },
+    {
+      "epoch": 0.1492616232956906,
+      "grad_norm": 6.638082504272461,
+      "learning_rate": 0.00018777249161372713,
+      "loss": 2.7291,
+      "step": 247
+    },
+    {
+      "epoch": 0.14986592136571364,
+      "grad_norm": 7.254632472991943,
+      "learning_rate": 0.00018767126700334634,
+      "loss": 3.0853,
+      "step": 248
+    },
+    {
+      "epoch": 0.15047021943573669,
+      "grad_norm": 15.801924705505371,
+      "learning_rate": 0.0001875696526445749,
+      "loss": 3.792,
+      "step": 249
+    },
+    {
+      "epoch": 0.1510745175057597,
+      "grad_norm": 34.2322883605957,
+      "learning_rate": 0.0001874676489891461,
+      "loss": 4.3077,
+      "step": 250
+    },
+    {
+      "epoch": 0.15167881557578275,
+      "grad_norm": 2.6348602771759033,
+      "learning_rate": 0.00018736525649052394,
+      "loss": 0.6426,
+      "step": 251
+    },
+    {
+      "epoch": 0.1522831136458058,
+      "grad_norm": 2.4239795207977295,
+      "learning_rate": 0.00018726247560390099,
+      "loss": 1.3992,
+      "step": 252
+    },
+    {
+      "epoch": 0.15288741171582884,
+      "grad_norm": 2.6609582901000977,
+      "learning_rate": 0.00018715930678619644,
+      "loss": 1.9515,
+      "step": 253
+    },
+    {
+      "epoch": 0.15349170978585186,
+      "grad_norm": 2.8232662677764893,
+      "learning_rate": 0.00018705575049605413,
+      "loss": 2.2264,
+      "step": 254
+    },
+    {
+      "epoch": 0.1540960078558749,
+      "grad_norm": 3.030365467071533,
+      "learning_rate": 0.00018695180719384029,
+      "loss": 2.1837,
+      "step": 255
+    },
+    {
+      "epoch": 0.15470030592589795,
+      "grad_norm": 2.785435199737549,
+      "learning_rate": 0.00018684747734164177,
+      "loss": 2.1539,
+      "step": 256
+    },
+    {
+      "epoch": 0.155304603995921,
+      "grad_norm": 2.9929676055908203,
+      "learning_rate": 0.00018674276140326376,
+      "loss": 2.2902,
+      "step": 257
+    },
+    {
+      "epoch": 0.15590890206594402,
+      "grad_norm": 2.787749767303467,
+      "learning_rate": 0.00018663765984422786,
+      "loss": 2.2207,
+      "step": 258
+    },
+    {
+      "epoch": 0.15651320013596706,
+      "grad_norm": 3.058166742324829,
+      "learning_rate": 0.00018653217313177004,
+      "loss": 2.328,
+      "step": 259
+    },
+    {
+      "epoch": 0.1571174982059901,
+      "grad_norm": 2.885544776916504,
+      "learning_rate": 0.00018642630173483832,
+      "loss": 2.4258,
+      "step": 260
+    },
+    {
+      "epoch": 0.15772179627601315,
+      "grad_norm": 2.925114870071411,
+      "learning_rate": 0.00018632004612409103,
+      "loss": 2.267,
+      "step": 261
+    },
+    {
+      "epoch": 0.15832609434603617,
+      "grad_norm": 3.052964210510254,
+      "learning_rate": 0.00018621340677189453,
+      "loss": 2.1573,
+      "step": 262
+    },
+    {
+      "epoch": 0.15893039241605922,
+      "grad_norm": 3.1201493740081787,
+      "learning_rate": 0.00018610638415232097,
+      "loss": 1.9854,
+      "step": 263
+    },
+    {
+      "epoch": 0.15953469048608226,
+      "grad_norm": 3.032017946243286,
+      "learning_rate": 0.00018599897874114652,
+      "loss": 2.0349,
+      "step": 264
+    },
+    {
+      "epoch": 0.1601389885561053,
+      "grad_norm": 2.9624085426330566,
+      "learning_rate": 0.00018589119101584898,
+      "loss": 1.8199,
+      "step": 265
+    },
+    {
+      "epoch": 0.16074328662612833,
+      "grad_norm": 3.859501361846924,
+      "learning_rate": 0.00018578302145560584,
+      "loss": 1.8289,
+      "step": 266
+    },
+    {
+      "epoch": 0.16134758469615137,
+      "grad_norm": 3.10041880607605,
+      "learning_rate": 0.00018567447054129195,
+      "loss": 1.8516,
+      "step": 267
+    },
+    {
+      "epoch": 0.16195188276617442,
+      "grad_norm": 3.583569288253784,
+      "learning_rate": 0.00018556553875547754,
+      "loss": 1.9975,
+      "step": 268
+    },
+    {
+      "epoch": 0.16255618083619747,
+      "grad_norm": 2.8302621841430664,
+      "learning_rate": 0.00018545622658242607,
+      "loss": 1.9481,
+      "step": 269
+    },
+    {
+      "epoch": 0.16316047890622049,
+      "grad_norm": 2.866164207458496,
+      "learning_rate": 0.00018534653450809197,
+      "loss": 1.8458,
+      "step": 270
+    },
+    {
+      "epoch": 0.16376477697624353,
+      "grad_norm": 2.974346399307251,
+      "learning_rate": 0.00018523646302011867,
+      "loss": 1.9694,
+      "step": 271
+    },
+    {
+      "epoch": 0.16436907504626658,
+      "grad_norm": 3.0076396465301514,
+      "learning_rate": 0.00018512601260783606,
+      "loss": 2.0929,
+      "step": 272
+    },
+    {
+      "epoch": 0.16497337311628962,
+      "grad_norm": 3.253673553466797,
+      "learning_rate": 0.00018501518376225887,
+      "loss": 2.0573,
+      "step": 273
+    },
+    {
+      "epoch": 0.16557767118631264,
+      "grad_norm": 3.42087721824646,
+      "learning_rate": 0.00018490397697608395,
+      "loss": 2.0241,
+      "step": 274
+    },
+    {
+      "epoch": 0.1661819692563357,
+      "grad_norm": 3.4087624549865723,
+      "learning_rate": 0.0001847923927436884,
+      "loss": 2.0986,
+      "step": 275
+    },
+    {
+      "epoch": 0.16678626732635873,
+      "grad_norm": 3.0722851753234863,
+      "learning_rate": 0.00018468043156112728,
+      "loss": 2.1384,
+      "step": 276
+    },
+    {
+      "epoch": 0.16739056539638175,
+      "grad_norm": 3.203979730606079,
+      "learning_rate": 0.0001845680939261314,
+      "loss": 2.2446,
+      "step": 277
+    },
+    {
+      "epoch": 0.1679948634664048,
+      "grad_norm": 3.2715237140655518,
+      "learning_rate": 0.00018445538033810515,
+      "loss": 2.1088,
+      "step": 278
+    },
+    {
+      "epoch": 0.16859916153642784,
+      "grad_norm": 3.1999757289886475,
+      "learning_rate": 0.00018434229129812418,
+      "loss": 2.1583,
+      "step": 279
+    },
+    {
+      "epoch": 0.1692034596064509,
+      "grad_norm": 3.3717172145843506,
+      "learning_rate": 0.0001842288273089332,
+      "loss": 2.0239,
+      "step": 280
+    },
+    {
+      "epoch": 0.1698077576764739,
+      "grad_norm": 3.339036464691162,
+      "learning_rate": 0.00018411498887494396,
+      "loss": 2.199,
+      "step": 281
+    },
+    {
+      "epoch": 0.17041205574649695,
+      "grad_norm": 3.3981027603149414,
+      "learning_rate": 0.00018400077650223263,
+      "loss": 2.254,
+      "step": 282
+    },
+    {
+      "epoch": 0.17101635381652,
+      "grad_norm": 3.418525457382202,
+      "learning_rate": 0.0001838861906985379,
+      "loss": 2.1244,
+      "step": 283
+    },
+    {
+      "epoch": 0.17162065188654305,
+      "grad_norm": 3.4555487632751465,
+      "learning_rate": 0.00018377123197325842,
+      "loss": 2.2547,
+      "step": 284
+    },
+    {
+      "epoch": 0.17222494995656606,
+      "grad_norm": 3.6327109336853027,
+      "learning_rate": 0.00018365590083745085,
+      "loss": 2.3695,
+      "step": 285
+    },
+    {
+      "epoch": 0.1728292480265891,
+      "grad_norm": 3.622626304626465,
+      "learning_rate": 0.00018354019780382735,
+      "loss": 2.2134,
+      "step": 286
+    },
+    {
+      "epoch": 0.17343354609661216,
+      "grad_norm": 4.037292957305908,
+      "learning_rate": 0.0001834241233867533,
+      "loss": 2.2762,
+      "step": 287
+    },
+    {
+      "epoch": 0.1740378441666352,
+      "grad_norm": 3.8968253135681152,
+      "learning_rate": 0.00018330767810224524,
+      "loss": 2.2187,
+      "step": 288
+    },
+    {
+      "epoch": 0.17464214223665822,
+      "grad_norm": 4.63348388671875,
+      "learning_rate": 0.0001831908624679683,
+      "loss": 2.2918,
+      "step": 289
+    },
+    {
+      "epoch": 0.17524644030668127,
+      "grad_norm": 4.117985248565674,
+      "learning_rate": 0.0001830736770032341,
+      "loss": 2.2636,
+      "step": 290
+    },
+    {
+      "epoch": 0.1758507383767043,
+      "grad_norm": 3.8585355281829834,
+      "learning_rate": 0.0001829561222289984,
+      "loss": 2.145,
+      "step": 291
+    },
+    {
+      "epoch": 0.17645503644672736,
+      "grad_norm": 4.775058269500732,
+      "learning_rate": 0.00018283819866785853,
+      "loss": 2.3798,
+      "step": 292
+    },
+    {
+      "epoch": 0.17705933451675038,
+      "grad_norm": 4.206576824188232,
+      "learning_rate": 0.0001827199068440516,
+      "loss": 2.2302,
+      "step": 293
+    },
+    {
+      "epoch": 0.17766363258677342,
+      "grad_norm": 5.961564064025879,
+      "learning_rate": 0.00018260124728345162,
+      "loss": 2.6911,
+      "step": 294
+    },
+    {
+      "epoch": 0.17826793065679647,
+      "grad_norm": 5.647076606750488,
+      "learning_rate": 0.00018248222051356754,
+      "loss": 2.8343,
+      "step": 295
+    },
+    {
+      "epoch": 0.17887222872681952,
+      "grad_norm": 5.922065258026123,
+      "learning_rate": 0.00018236282706354063,
+      "loss": 2.4506,
+      "step": 296
+    },
+    {
+      "epoch": 0.17947652679684253,
+      "grad_norm": 6.84862756729126,
+      "learning_rate": 0.00018224306746414238,
+      "loss": 2.8937,
+      "step": 297
+    },
+    {
+      "epoch": 0.18008082486686558,
+      "grad_norm": 7.7207746505737305,
+      "learning_rate": 0.00018212294224777197,
+      "loss": 2.7428,
+      "step": 298
+    },
+    {
+      "epoch": 0.18068512293688863,
+      "grad_norm": 13.893342018127441,
+      "learning_rate": 0.00018200245194845399,
+      "loss": 3.5557,
+      "step": 299
+    },
+    {
+      "epoch": 0.18128942100691167,
+      "grad_norm": 43.88550567626953,
+      "learning_rate": 0.00018188159710183594,
+      "loss": 4.1933,
+      "step": 300
+    },
+    {
+      "epoch": 0.1818937190769347,
+      "grad_norm": 2.99881911277771,
+      "learning_rate": 0.000181760378245186,
+      "loss": 0.6338,
+      "step": 301
+    },
+    {
+      "epoch": 0.18249801714695774,
+      "grad_norm": 2.6720895767211914,
+      "learning_rate": 0.00018163879591739067,
+      "loss": 1.2616,
+      "step": 302
+    },
+    {
+      "epoch": 0.18310231521698078,
+      "grad_norm": 3.5430455207824707,
+      "learning_rate": 0.0001815168506589521,
+      "loss": 1.9207,
+      "step": 303
+    },
+    {
+      "epoch": 0.18370661328700383,
+      "grad_norm": 3.131864309310913,
+      "learning_rate": 0.000181394543011986,
+      "loss": 2.1036,
+      "step": 304
+    },
+    {
+      "epoch": 0.18431091135702685,
+      "grad_norm": 2.913851499557495,
+      "learning_rate": 0.00018127187352021907,
+      "loss": 2.2422,
+      "step": 305
+    },
+    {
+      "epoch": 0.1849152094270499,
+      "grad_norm": 2.898662567138672,
+      "learning_rate": 0.0001811488427289866,
+      "loss": 2.1503,
+      "step": 306
+    },
+    {
+      "epoch": 0.18551950749707294,
+      "grad_norm": 2.8844833374023438,
+      "learning_rate": 0.00018102545118523007,
+      "loss": 2.3369,
+      "step": 307
+    },
+    {
+      "epoch": 0.18612380556709596,
+      "grad_norm": 3.364750623703003,
+      "learning_rate": 0.00018090169943749476,
+      "loss": 2.2679,
+      "step": 308
+    },
+    {
+      "epoch": 0.186728103637119,
+      "grad_norm": 3.5225651264190674,
+      "learning_rate": 0.00018077758803592718,
+      "loss": 2.3331,
+      "step": 309
+    },
+    {
+      "epoch": 0.18733240170714205,
+      "grad_norm": 3.346677303314209,
+      "learning_rate": 0.00018065311753227273,
+      "loss": 2.309,
+      "step": 310
+    },
+    {
+      "epoch": 0.1879366997771651,
+      "grad_norm": 3.386894702911377,
+      "learning_rate": 0.0001805282884798732,
+      "loss": 2.0257,
+      "step": 311
+    },
+    {
+      "epoch": 0.1885409978471881,
+      "grad_norm": 2.942502737045288,
+      "learning_rate": 0.00018040310143366446,
+      "loss": 2.1896,
+      "step": 312
+    },
+    {
+      "epoch": 0.18914529591721116,
+      "grad_norm": 3.2558510303497314,
+      "learning_rate": 0.00018027755695017368,
+      "loss": 1.828,
+      "step": 313
+    },
+    {
+      "epoch": 0.1897495939872342,
+      "grad_norm": 3.073988676071167,
+      "learning_rate": 0.00018015165558751717,
+      "loss": 1.9587,
+      "step": 314
+    },
+    {
+      "epoch": 0.19035389205725725,
+      "grad_norm": 3.157158374786377,
+      "learning_rate": 0.00018002539790539773,
+      "loss": 1.748,
+      "step": 315
+    },
+    {
+      "epoch": 0.19095819012728027,
+      "grad_norm": 3.194667100906372,
+      "learning_rate": 0.00017989878446510215,
+      "loss": 1.8558,
+      "step": 316
+    },
+    {
+      "epoch": 0.19156248819730332,
+      "grad_norm": 2.8696138858795166,
+      "learning_rate": 0.00017977181582949888,
+      "loss": 1.8613,
+      "step": 317
+    },
+    {
+      "epoch": 0.19216678626732636,
+      "grad_norm": 3.149181842803955,
+      "learning_rate": 0.0001796444925630353,
+      "loss": 1.9398,
+      "step": 318
+    },
+    {
+      "epoch": 0.1927710843373494,
+      "grad_norm": 3.128167152404785,
+      "learning_rate": 0.00017951681523173542,
+      "loss": 1.8389,
+      "step": 319
+    },
+    {
+      "epoch": 0.19337538240737243,
+      "grad_norm": 3.1106295585632324,
+      "learning_rate": 0.0001793887844031972,
+      "loss": 1.7954,
+      "step": 320
+    },
+    {
+      "epoch": 0.19397968047739547,
+      "grad_norm": 3.2423572540283203,
+      "learning_rate": 0.00017926040064659014,
+      "loss": 1.9777,
+      "step": 321
+    },
+    {
+      "epoch": 0.19458397854741852,
+      "grad_norm": 3.1443405151367188,
+      "learning_rate": 0.0001791316645326526,
+      "loss": 1.8808,
+      "step": 322
+    },
+    {
+      "epoch": 0.19518827661744156,
+      "grad_norm": 3.0229270458221436,
+      "learning_rate": 0.00017900257663368963,
+      "loss": 2.0481,
+      "step": 323
+    },
+    {
+      "epoch": 0.19579257468746458,
+      "grad_norm": 3.2192728519439697,
+      "learning_rate": 0.0001788731375235698,
+      "loss": 1.8968,
+      "step": 324
+    },
+    {
+      "epoch": 0.19639687275748763,
+      "grad_norm": 3.273191213607788,
+      "learning_rate": 0.00017874334777772327,
+      "loss": 1.9875,
+      "step": 325
+    },
+    {
+      "epoch": 0.19700117082751067,
+      "grad_norm": 3.2770490646362305,
+      "learning_rate": 0.00017861320797313892,
+      "loss": 2.0555,
+      "step": 326
+    },
+    {
+      "epoch": 0.19760546889753372,
+      "grad_norm": 3.1857221126556396,
+      "learning_rate": 0.0001784827186883618,
+      "loss": 2.0772,
+      "step": 327
+    },
+    {
+      "epoch": 0.19820976696755674,
+      "grad_norm": 3.428528308868408,
+      "learning_rate": 0.00017835188050349064,
+      "loss": 2.1237,
+      "step": 328
+    },
+    {
+      "epoch": 0.19881406503757978,
+      "grad_norm": 3.3303983211517334,
+      "learning_rate": 0.00017822069400017516,
+      "loss": 2.139,
+      "step": 329
+    },
+    {
+      "epoch": 0.19941836310760283,
+      "grad_norm": 3.7317135334014893,
+      "learning_rate": 0.00017808915976161362,
+      "loss": 2.1451,
+      "step": 330
+    },
+    {
+      "epoch": 0.20002266117762588,
+      "grad_norm": 3.494471788406372,
+      "learning_rate": 0.00017795727837255015,
+      "loss": 2.252,
+      "step": 331
+    },
+    {
+      "epoch": 0.2006269592476489,
+      "grad_norm": 3.569424867630005,
+      "learning_rate": 0.00017782505041927216,
+      "loss": 2.1137,
+      "step": 332
+    },
+    {
+      "epoch": 0.20123125731767194,
+      "grad_norm": 3.4209699630737305,
+      "learning_rate": 0.00017769247648960774,
+      "loss": 2.0488,
+      "step": 333
+    },
+    {
+      "epoch": 0.201835555387695,
+      "grad_norm": 3.602125883102417,
+      "learning_rate": 0.00017755955717292296,
+      "loss": 2.1312,
+      "step": 334
+    },
+    {
+      "epoch": 0.20243985345771803,
+      "grad_norm": 3.67684268951416,
+      "learning_rate": 0.00017742629306011944,
+      "loss": 2.1442,
+      "step": 335
+    },
+    {
+      "epoch": 0.20304415152774105,
+      "grad_norm": 4.092548847198486,
+      "learning_rate": 0.00017729268474363154,
+      "loss": 2.1001,
+      "step": 336
+    },
+    {
+      "epoch": 0.2036484495977641,
+      "grad_norm": 4.0197014808654785,
+      "learning_rate": 0.0001771587328174239,
+      "loss": 2.3081,
+      "step": 337
+    },
+    {
+      "epoch": 0.20425274766778714,
+      "grad_norm": 3.9904866218566895,
+      "learning_rate": 0.0001770244378769885,
+      "loss": 2.1657,
+      "step": 338
+    },
+    {
+      "epoch": 0.20485704573781016,
+      "grad_norm": 4.197107315063477,
+      "learning_rate": 0.0001768898005193425,
+      "loss": 2.3272,
+      "step": 339
+    },
+    {
+      "epoch": 0.2054613438078332,
+      "grad_norm": 5.174318790435791,
+      "learning_rate": 0.000176754821343025,
+      "loss": 2.396,
+      "step": 340
+    },
+    {
+      "epoch": 0.20606564187785625,
+      "grad_norm": 4.303919792175293,
+      "learning_rate": 0.0001766195009480949,
+      "loss": 2.285,
+      "step": 341
+    },
+    {
+      "epoch": 0.2066699399478793,
+      "grad_norm": 4.526043891906738,
+      "learning_rate": 0.0001764838399361279,
+      "loss": 2.254,
+      "step": 342
+    },
+    {
+      "epoch": 0.20727423801790232,
+      "grad_norm": 4.815467357635498,
+      "learning_rate": 0.00017634783891021393,
+      "loss": 2.3658,
+      "step": 343
+    },
+    {
+      "epoch": 0.20787853608792536,
+      "grad_norm": 4.806483745574951,
+      "learning_rate": 0.00017621149847495458,
+      "loss": 2.4033,
+      "step": 344
+    },
+    {
+      "epoch": 0.2084828341579484,
+      "grad_norm": 5.406166076660156,
+      "learning_rate": 0.00017607481923646016,
+      "loss": 2.2625,
+      "step": 345
+    },
+    {
+      "epoch": 0.20908713222797146,
+      "grad_norm": 5.956484794616699,
+      "learning_rate": 0.0001759378018023473,
+      "loss": 2.635,
+      "step": 346
+    },
+    {
+      "epoch": 0.20969143029799447,
+      "grad_norm": 6.868587493896484,
+      "learning_rate": 0.00017580044678173592,
+      "loss": 2.5105,
+      "step": 347
+    },
+    {
+      "epoch": 0.21029572836801752,
+      "grad_norm": 7.553473949432373,
+      "learning_rate": 0.00017566275478524693,
+      "loss": 2.6961,
+      "step": 348
+    },
+    {
+      "epoch": 0.21090002643804057,
+      "grad_norm": 13.07988452911377,
+      "learning_rate": 0.0001755247264249991,
+      "loss": 3.5881,
+      "step": 349
+    },
+    {
+      "epoch": 0.2115043245080636,
+      "grad_norm": 31.356632232666016,
+      "learning_rate": 0.0001753863623146066,
+      "loss": 4.2451,
+      "step": 350
+    },
+    {
+      "epoch": 0.21210862257808663,
+      "grad_norm": 3.042705535888672,
+      "learning_rate": 0.00017524766306917618,
+      "loss": 0.5812,
+      "step": 351
+    },
+    {
+      "epoch": 0.21271292064810968,
+      "grad_norm": 2.7835421562194824,
+      "learning_rate": 0.0001751086293053045,
+      "loss": 0.9333,
+      "step": 352
+    },
+    {
+      "epoch": 0.21331721871813272,
+      "grad_norm": 2.9173827171325684,
+      "learning_rate": 0.0001749692616410753,
+      "loss": 1.7894,
+      "step": 353
+    },
+    {
+      "epoch": 0.21392151678815577,
+      "grad_norm": 3.177105665206909,
+      "learning_rate": 0.00017482956069605668,
+      "loss": 2.0359,
+      "step": 354
+    },
+    {
+      "epoch": 0.2145258148581788,
+      "grad_norm": 3.253769874572754,
+      "learning_rate": 0.00017468952709129846,
+      "loss": 2.1539,
+      "step": 355
+    },
+    {
+      "epoch": 0.21513011292820183,
+      "grad_norm": 3.295257329940796,
+      "learning_rate": 0.00017454916144932922,
+      "loss": 2.22,
+      "step": 356
+    },
+    {
+      "epoch": 0.21573441099822488,
+      "grad_norm": 2.9911811351776123,
+      "learning_rate": 0.0001744084643941536,
+      "loss": 2.2175,
+      "step": 357
+    },
+    {
+      "epoch": 0.21633870906824793,
+      "grad_norm": 3.004169464111328,
+      "learning_rate": 0.00017426743655124974,
+      "loss": 2.1925,
+      "step": 358
+    },
+    {
+      "epoch": 0.21694300713827094,
+      "grad_norm": 2.8875339031219482,
+      "learning_rate": 0.0001741260785475661,
+      "loss": 2.2666,
+      "step": 359
+    },
+    {
+      "epoch": 0.217547305208294,
+      "grad_norm": 3.057417154312134,
+      "learning_rate": 0.00017398439101151905,
+      "loss": 2.1484,
+      "step": 360
+    },
+    {
+      "epoch": 0.21815160327831704,
+      "grad_norm": 2.7903478145599365,
+      "learning_rate": 0.00017384237457298987,
+      "loss": 2.2869,
+      "step": 361
+    },
+    {
+      "epoch": 0.21875590134834008,
+      "grad_norm": 3.01904559135437,
+      "learning_rate": 0.00017370002986332193,
+      "loss": 2.3253,
+      "step": 362
+    },
+    {
+      "epoch": 0.2193601994183631,
+      "grad_norm": 3.055483102798462,
+      "learning_rate": 0.00017355735751531807,
+      "loss": 2.0748,
+      "step": 363
+    },
+    {
+      "epoch": 0.21996449748838615,
+      "grad_norm": 3.1257870197296143,
+      "learning_rate": 0.00017341435816323756,
+      "loss": 1.8722,
+      "step": 364
+    },
+    {
+      "epoch": 0.2205687955584092,
+      "grad_norm": 3.139169454574585,
+      "learning_rate": 0.00017327103244279348,
+      "loss": 2.0349,
+      "step": 365
+    },
+    {
+      "epoch": 0.22117309362843224,
+      "grad_norm": 3.2363922595977783,
+      "learning_rate": 0.00017312738099114973,
+      "loss": 1.8798,
+      "step": 366
+    },
+    {
+      "epoch": 0.22177739169845526,
+      "grad_norm": 3.5106101036071777,
+      "learning_rate": 0.00017298340444691835,
+      "loss": 1.9831,
+      "step": 367
+    },
+    {
+      "epoch": 0.2223816897684783,
+      "grad_norm": 3.2510836124420166,
+      "learning_rate": 0.00017283910345015647,
+      "loss": 1.8261,
+      "step": 368
+    },
+    {
+      "epoch": 0.22298598783850135,
+      "grad_norm": 3.1266050338745117,
+      "learning_rate": 0.0001726944786423637,
+      "loss": 1.7586,
+      "step": 369
+    },
+    {
+      "epoch": 0.22359028590852437,
+      "grad_norm": 3.6328091621398926,
+      "learning_rate": 0.00017254953066647913,
+      "loss": 1.9136,
+      "step": 370
+    },
+    {
+      "epoch": 0.2241945839785474,
+      "grad_norm": 3.0875422954559326,
+      "learning_rate": 0.00017240426016687863,
+      "loss": 1.9292,
+      "step": 371
+    },
+    {
+      "epoch": 0.22479888204857046,
+      "grad_norm": 3.426515579223633,
+      "learning_rate": 0.00017225866778937165,
+      "loss": 1.9904,
+      "step": 372
+    },
+    {
+      "epoch": 0.2254031801185935,
+      "grad_norm": 3.369088649749756,
+      "learning_rate": 0.00017211275418119876,
+      "loss": 1.7777,
+      "step": 373
+    },
+    {
+      "epoch": 0.22600747818861652,
+      "grad_norm": 3.675102710723877,
+      "learning_rate": 0.0001719665199910285,
+      "loss": 1.7818,
+      "step": 374
+    },
+    {
+      "epoch": 0.22661177625863957,
+      "grad_norm": 3.5006134510040283,
+      "learning_rate": 0.00017181996586895454,
+      "loss": 2.0411,
+      "step": 375
+    },
+    {
+      "epoch": 0.22661177625863957,
+      "eval_loss": 2.133070945739746,
+      "eval_runtime": 29.9007,
+      "eval_samples_per_second": 93.208,
+      "eval_steps_per_second": 46.621,
+      "step": 375
+    }
+  ],
+  "logging_steps": 1,
+  "max_steps": 1500,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 1,
+  "save_steps": 375,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 6395465727737856.0,
+  "train_batch_size": 2,
+  "trial_name": null,
+  "trial_params": null
+}

last-checkpoint/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4863d894ba3ce228a26c7331f57b4eefeb9c93b592ab53d82b32a503b450e298
+size 6840

last-checkpoint/vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff