Training in progress, step 200, checkpoint

Browse files

Files changed (12) hide show

last-checkpoint/README.md +202 -0
last-checkpoint/adapter_config.json +34 -0
last-checkpoint/adapter_model.safetensors +3 -0
last-checkpoint/optimizer.pt +3 -0
last-checkpoint/rng_state.pth +3 -0
last-checkpoint/scheduler.pt +3 -0
last-checkpoint/special_tokens_map.json +30 -0
last-checkpoint/tokenizer.json +0 -0
last-checkpoint/tokenizer.model +3 -0
last-checkpoint/tokenizer_config.json +42 -0
last-checkpoint/trainer_state.json +1458 -0
last-checkpoint/training_args.bin +3 -0

last-checkpoint/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: OpenBuddy/openbuddy-llama2-13b-v8.1-fp16
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.13.2

last-checkpoint/adapter_config.json ADDED Viewed

	@@ -0,0 +1,34 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "OpenBuddy/openbuddy-llama2-13b-v8.1-fp16",
+  "bias": "none",
+  "fan_in_fan_out": null,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 64,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 32,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "gate_proj",
+    "up_proj",
+    "k_proj",
+    "v_proj",
+    "q_proj",
+    "o_proj",
+    "down_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

last-checkpoint/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:dd25b45d5ae0ea6e43f913fe66e0f64d299bc676217383b7326c580a58dcb13f
+size 500770656

last-checkpoint/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f061e11dff08092a31a772fd5c327bd05c3e90c8f1bb09f8ce977226e8c25b87
+size 254917780

last-checkpoint/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0d28e178f0eda4d42cd46ff2afc6e7871a49cadbf7bcf4dce9b4ca05bf170c53
+size 14244

last-checkpoint/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b2337c70d81c654a141de045fecd9cf914e3cff6a8f6115c0b8b06152174fe9e
+size 1064

last-checkpoint/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,30 @@

+{
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  }
+}

last-checkpoint/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

last-checkpoint/tokenizer.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f440c53d2cc6f14a7ed7124dea5f5a7402fb4fc95bccb5d8be6d0f7e74d327ed
+size 568229

last-checkpoint/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,42 @@

+{
+  "add_bos_token": true,
+  "add_eos_token": false,
+  "add_prefix_space": null,
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<s>",
+  "chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}{% endif %}",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "</s>",
+  "legacy": true,
+  "model_max_length": 1000000000000000019884624838656,
+  "pad_token": "</s>",
+  "sp_model_kwargs": {},
+  "tokenizer_class": "LlamaTokenizer",
+  "unk_token": "<unk>",
+  "use_default_system_prompt": false
+}

last-checkpoint/trainer_state.json ADDED Viewed

	@@ -0,0 +1,1458 @@

+{
+  "best_metric": 0.8889594674110413,
+  "best_model_checkpoint": "miner_id_24/checkpoint-200",
+  "epoch": 0.027588109524794815,
+  "eval_steps": 200,
+  "global_step": 200,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.00013794054762397408,
+      "grad_norm": 0.48702943325042725,
+      "learning_rate": 6.666666666666667e-06,
+      "loss": 1.011,
+      "step": 1
+    },
+    {
+      "epoch": 0.00013794054762397408,
+      "eval_loss": 3.1967947483062744,
+      "eval_runtime": 23.4225,
+      "eval_samples_per_second": 2.519,
+      "eval_steps_per_second": 2.519,
+      "step": 1
+    },
+    {
+      "epoch": 0.00027588109524794816,
+      "grad_norm": 0.39228329062461853,
+      "learning_rate": 1.3333333333333333e-05,
+      "loss": 0.8269,
+      "step": 2
+    },
+    {
+      "epoch": 0.0004138216428719222,
+      "grad_norm": 0.5630053877830505,
+      "learning_rate": 2e-05,
+      "loss": 1.0138,
+      "step": 3
+    },
+    {
+      "epoch": 0.0005517621904958963,
+      "grad_norm": 0.5379119515419006,
+      "learning_rate": 2.6666666666666667e-05,
+      "loss": 0.8146,
+      "step": 4
+    },
+    {
+      "epoch": 0.0006897027381198703,
+      "grad_norm": 0.5841886401176453,
+      "learning_rate": 3.3333333333333335e-05,
+      "loss": 0.7592,
+      "step": 5
+    },
+    {
+      "epoch": 0.0008276432857438444,
+      "grad_norm": 0.46231576800346375,
+      "learning_rate": 4e-05,
+      "loss": 0.9071,
+      "step": 6
+    },
+    {
+      "epoch": 0.0009655838333678184,
+      "grad_norm": 0.6419610381126404,
+      "learning_rate": 4.666666666666667e-05,
+      "loss": 0.9903,
+      "step": 7
+    },
+    {
+      "epoch": 0.0011035243809917926,
+      "grad_norm": 0.4809350073337555,
+      "learning_rate": 5.333333333333333e-05,
+      "loss": 0.8859,
+      "step": 8
+    },
+    {
+      "epoch": 0.0012414649286157666,
+      "grad_norm": 0.4701339900493622,
+      "learning_rate": 6e-05,
+      "loss": 0.8131,
+      "step": 9
+    },
+    {
+      "epoch": 0.0013794054762397406,
+      "grad_norm": 0.4429624676704407,
+      "learning_rate": 6.666666666666667e-05,
+      "loss": 0.8727,
+      "step": 10
+    },
+    {
+      "epoch": 0.0015173460238637148,
+      "grad_norm": 0.46402427554130554,
+      "learning_rate": 7.333333333333333e-05,
+      "loss": 0.7067,
+      "step": 11
+    },
+    {
+      "epoch": 0.0016552865714876887,
+      "grad_norm": 0.4485568106174469,
+      "learning_rate": 8e-05,
+      "loss": 0.7596,
+      "step": 12
+    },
+    {
+      "epoch": 0.001793227119111663,
+      "grad_norm": 0.3845665752887726,
+      "learning_rate": 8.666666666666667e-05,
+      "loss": 0.818,
+      "step": 13
+    },
+    {
+      "epoch": 0.0019311676667356369,
+      "grad_norm": 0.5298904180526733,
+      "learning_rate": 9.333333333333334e-05,
+      "loss": 0.9865,
+      "step": 14
+    },
+    {
+      "epoch": 0.002069108214359611,
+      "grad_norm": 0.5633847713470459,
+      "learning_rate": 0.0001,
+      "loss": 0.761,
+      "step": 15
+    },
+    {
+      "epoch": 0.0022070487619835853,
+      "grad_norm": 0.5660894513130188,
+      "learning_rate": 0.00010666666666666667,
+      "loss": 0.6838,
+      "step": 16
+    },
+    {
+      "epoch": 0.0023449893096075592,
+      "grad_norm": 0.6358833909034729,
+      "learning_rate": 0.00011333333333333334,
+      "loss": 0.9807,
+      "step": 17
+    },
+    {
+      "epoch": 0.002482929857231533,
+      "grad_norm": 0.36259546875953674,
+      "learning_rate": 0.00012,
+      "loss": 0.7468,
+      "step": 18
+    },
+    {
+      "epoch": 0.002620870404855507,
+      "grad_norm": 0.7406138777732849,
+      "learning_rate": 0.00012666666666666666,
+      "loss": 1.2012,
+      "step": 19
+    },
+    {
+      "epoch": 0.002758810952479481,
+      "grad_norm": 0.5918175578117371,
+      "learning_rate": 0.00013333333333333334,
+      "loss": 0.7447,
+      "step": 20
+    },
+    {
+      "epoch": 0.0028967515001034555,
+      "grad_norm": 0.4696408808231354,
+      "learning_rate": 0.00014,
+      "loss": 0.5811,
+      "step": 21
+    },
+    {
+      "epoch": 0.0030346920477274295,
+      "grad_norm": 0.8551336526870728,
+      "learning_rate": 0.00014666666666666666,
+      "loss": 0.8785,
+      "step": 22
+    },
+    {
+      "epoch": 0.0031726325953514035,
+      "grad_norm": 1.1069241762161255,
+      "learning_rate": 0.00015333333333333334,
+      "loss": 0.7652,
+      "step": 23
+    },
+    {
+      "epoch": 0.0033105731429753774,
+      "grad_norm": 0.4773555099964142,
+      "learning_rate": 0.00016,
+      "loss": 0.5968,
+      "step": 24
+    },
+    {
+      "epoch": 0.003448513690599352,
+      "grad_norm": 0.6919654011726379,
+      "learning_rate": 0.0001666666666666667,
+      "loss": 0.785,
+      "step": 25
+    },
+    {
+      "epoch": 0.003586454238223326,
+      "grad_norm": 0.6016067862510681,
+      "learning_rate": 0.00017333333333333334,
+      "loss": 0.6783,
+      "step": 26
+    },
+    {
+      "epoch": 0.0037243947858473,
+      "grad_norm": 0.6246861815452576,
+      "learning_rate": 0.00018,
+      "loss": 1.1969,
+      "step": 27
+    },
+    {
+      "epoch": 0.0038623353334712738,
+      "grad_norm": 0.7044798135757446,
+      "learning_rate": 0.0001866666666666667,
+      "loss": 0.7046,
+      "step": 28
+    },
+    {
+      "epoch": 0.004000275881095248,
+      "grad_norm": 0.44917258620262146,
+      "learning_rate": 0.00019333333333333333,
+      "loss": 0.5572,
+      "step": 29
+    },
+    {
+      "epoch": 0.004138216428719222,
+      "grad_norm": 0.5091779232025146,
+      "learning_rate": 0.0002,
+      "loss": 0.6602,
+      "step": 30
+    },
+    {
+      "epoch": 0.004276156976343196,
+      "grad_norm": 0.5474268198013306,
+      "learning_rate": 0.0001999999989536666,
+      "loss": 0.7928,
+      "step": 31
+    },
+    {
+      "epoch": 0.0044140975239671705,
+      "grad_norm": 0.5607008934020996,
+      "learning_rate": 0.00019999999581466645,
+      "loss": 0.8733,
+      "step": 32
+    },
+    {
+      "epoch": 0.004552038071591144,
+      "grad_norm": 0.8838821053504944,
+      "learning_rate": 0.00019999999058299957,
+      "loss": 1.2598,
+      "step": 33
+    },
+    {
+      "epoch": 0.0046899786192151184,
+      "grad_norm": 0.41664811968803406,
+      "learning_rate": 0.00019999998325866613,
+      "loss": 0.5844,
+      "step": 34
+    },
+    {
+      "epoch": 0.004827919166839092,
+      "grad_norm": 0.552245557308197,
+      "learning_rate": 0.00019999997384166624,
+      "loss": 0.7623,
+      "step": 35
+    },
+    {
+      "epoch": 0.004965859714463066,
+      "grad_norm": 0.7509823441505432,
+      "learning_rate": 0.0001999999623320001,
+      "loss": 0.6058,
+      "step": 36
+    },
+    {
+      "epoch": 0.005103800262087041,
+      "grad_norm": 0.527321457862854,
+      "learning_rate": 0.00019999994872966798,
+      "loss": 0.8593,
+      "step": 37
+    },
+    {
+      "epoch": 0.005241740809711014,
+      "grad_norm": 0.793042778968811,
+      "learning_rate": 0.00019999993303467014,
+      "loss": 1.3254,
+      "step": 38
+    },
+    {
+      "epoch": 0.005379681357334989,
+      "grad_norm": 0.6100105047225952,
+      "learning_rate": 0.0001999999152470069,
+      "loss": 1.3563,
+      "step": 39
+    },
+    {
+      "epoch": 0.005517621904958962,
+      "grad_norm": 0.564547598361969,
+      "learning_rate": 0.00019999989536667863,
+      "loss": 0.977,
+      "step": 40
+    },
+    {
+      "epoch": 0.005655562452582937,
+      "grad_norm": 0.633572518825531,
+      "learning_rate": 0.0001999998733936858,
+      "loss": 1.2476,
+      "step": 41
+    },
+    {
+      "epoch": 0.005793503000206911,
+      "grad_norm": 0.4978499412536621,
+      "learning_rate": 0.0001999998493280288,
+      "loss": 0.5857,
+      "step": 42
+    },
+    {
+      "epoch": 0.005931443547830885,
+      "grad_norm": 0.3591054677963257,
+      "learning_rate": 0.00019999982316970817,
+      "loss": 0.5134,
+      "step": 43
+    },
+    {
+      "epoch": 0.006069384095454859,
+      "grad_norm": 0.6278326511383057,
+      "learning_rate": 0.00019999979491872448,
+      "loss": 0.8699,
+      "step": 44
+    },
+    {
+      "epoch": 0.006207324643078833,
+      "grad_norm": 0.39985278248786926,
+      "learning_rate": 0.00019999976457507826,
+      "loss": 1.0652,
+      "step": 45
+    },
+    {
+      "epoch": 0.006345265190702807,
+      "grad_norm": 0.6397168636322021,
+      "learning_rate": 0.0001999997321387702,
+      "loss": 0.8229,
+      "step": 46
+    },
+    {
+      "epoch": 0.006483205738326781,
+      "grad_norm": 0.4380306601524353,
+      "learning_rate": 0.00019999969760980095,
+      "loss": 0.7043,
+      "step": 47
+    },
+    {
+      "epoch": 0.006621146285950755,
+      "grad_norm": 0.5619568824768066,
+      "learning_rate": 0.00019999966098817123,
+      "loss": 0.7648,
+      "step": 48
+    },
+    {
+      "epoch": 0.006759086833574729,
+      "grad_norm": 0.5171204209327698,
+      "learning_rate": 0.0001999996222738818,
+      "loss": 0.9302,
+      "step": 49
+    },
+    {
+      "epoch": 0.006897027381198704,
+      "grad_norm": 0.36841997504234314,
+      "learning_rate": 0.00019999958146693354,
+      "loss": 0.6045,
+      "step": 50
+    },
+    {
+      "epoch": 0.007034967928822677,
+      "grad_norm": 0.9860273599624634,
+      "learning_rate": 0.0001999995385673272,
+      "loss": 0.9305,
+      "step": 51
+    },
+    {
+      "epoch": 0.007172908476446652,
+      "grad_norm": 0.5823448300361633,
+      "learning_rate": 0.00019999949357506376,
+      "loss": 0.5956,
+      "step": 52
+    },
+    {
+      "epoch": 0.007310849024070625,
+      "grad_norm": 0.49382463097572327,
+      "learning_rate": 0.0001999994464901441,
+      "loss": 0.5102,
+      "step": 53
+    },
+    {
+      "epoch": 0.0074487895716946,
+      "grad_norm": 0.688617467880249,
+      "learning_rate": 0.00019999939731256926,
+      "loss": 1.547,
+      "step": 54
+    },
+    {
+      "epoch": 0.007586730119318574,
+      "grad_norm": 0.38612183928489685,
+      "learning_rate": 0.0001999993460423402,
+      "loss": 0.546,
+      "step": 55
+    },
+    {
+      "epoch": 0.0077246706669425475,
+      "grad_norm": 0.38228049874305725,
+      "learning_rate": 0.00019999929267945808,
+      "loss": 0.4671,
+      "step": 56
+    },
+    {
+      "epoch": 0.007862611214566521,
+      "grad_norm": 0.8089519143104553,
+      "learning_rate": 0.00019999923722392398,
+      "loss": 0.826,
+      "step": 57
+    },
+    {
+      "epoch": 0.008000551762190496,
+      "grad_norm": 0.5636898875236511,
+      "learning_rate": 0.00019999917967573904,
+      "loss": 0.9961,
+      "step": 58
+    },
+    {
+      "epoch": 0.00813849230981447,
+      "grad_norm": 0.5183124542236328,
+      "learning_rate": 0.00019999912003490445,
+      "loss": 0.5569,
+      "step": 59
+    },
+    {
+      "epoch": 0.008276432857438443,
+      "grad_norm": 0.6319698095321655,
+      "learning_rate": 0.00019999905830142152,
+      "loss": 0.8879,
+      "step": 60
+    },
+    {
+      "epoch": 0.008414373405062419,
+      "grad_norm": 0.48757442831993103,
+      "learning_rate": 0.00019999899447529148,
+      "loss": 0.511,
+      "step": 61
+    },
+    {
+      "epoch": 0.008552313952686392,
+      "grad_norm": 0.39482003450393677,
+      "learning_rate": 0.00019999892855651575,
+      "loss": 0.7831,
+      "step": 62
+    },
+    {
+      "epoch": 0.008690254500310366,
+      "grad_norm": 0.7696112990379333,
+      "learning_rate": 0.0001999988605450956,
+      "loss": 1.0776,
+      "step": 63
+    },
+    {
+      "epoch": 0.008828195047934341,
+      "grad_norm": 0.45441484451293945,
+      "learning_rate": 0.00019999879044103254,
+      "loss": 0.6991,
+      "step": 64
+    },
+    {
+      "epoch": 0.008966135595558315,
+      "grad_norm": 0.46844032406806946,
+      "learning_rate": 0.00019999871824432798,
+      "loss": 0.6784,
+      "step": 65
+    },
+    {
+      "epoch": 0.009104076143182288,
+      "grad_norm": 0.7997144460678101,
+      "learning_rate": 0.00019999864395498347,
+      "loss": 1.0729,
+      "step": 66
+    },
+    {
+      "epoch": 0.009242016690806263,
+      "grad_norm": 0.36192160844802856,
+      "learning_rate": 0.0001999985675730005,
+      "loss": 0.6021,
+      "step": 67
+    },
+    {
+      "epoch": 0.009379957238430237,
+      "grad_norm": 0.529750645160675,
+      "learning_rate": 0.00019999848909838078,
+      "loss": 0.6529,
+      "step": 68
+    },
+    {
+      "epoch": 0.00951789778605421,
+      "grad_norm": 1.026302695274353,
+      "learning_rate": 0.00019999840853112587,
+      "loss": 0.6982,
+      "step": 69
+    },
+    {
+      "epoch": 0.009655838333678184,
+      "grad_norm": 0.4954220950603485,
+      "learning_rate": 0.00019999832587123747,
+      "loss": 0.7171,
+      "step": 70
+    },
+    {
+      "epoch": 0.00979377888130216,
+      "grad_norm": 0.8853374719619751,
+      "learning_rate": 0.0001999982411187173,
+      "loss": 0.832,
+      "step": 71
+    },
+    {
+      "epoch": 0.009931719428926133,
+      "grad_norm": 0.400943398475647,
+      "learning_rate": 0.00019999815427356718,
+      "loss": 0.76,
+      "step": 72
+    },
+    {
+      "epoch": 0.010069659976550106,
+      "grad_norm": 0.40010085701942444,
+      "learning_rate": 0.0001999980653357889,
+      "loss": 0.5045,
+      "step": 73
+    },
+    {
+      "epoch": 0.010207600524174082,
+      "grad_norm": 0.4322604238986969,
+      "learning_rate": 0.00019999797430538427,
+      "loss": 0.6147,
+      "step": 74
+    },
+    {
+      "epoch": 0.010345541071798055,
+      "grad_norm": 0.5326778888702393,
+      "learning_rate": 0.0001999978811823553,
+      "loss": 0.8898,
+      "step": 75
+    },
+    {
+      "epoch": 0.010483481619422029,
+      "grad_norm": 0.43207046389579773,
+      "learning_rate": 0.00019999778596670385,
+      "loss": 0.5025,
+      "step": 76
+    },
+    {
+      "epoch": 0.010621422167046004,
+      "grad_norm": 0.5074573159217834,
+      "learning_rate": 0.00019999768865843195,
+      "loss": 0.9172,
+      "step": 77
+    },
+    {
+      "epoch": 0.010759362714669977,
+      "grad_norm": 0.9970151782035828,
+      "learning_rate": 0.00019999758925754162,
+      "loss": 0.8964,
+      "step": 78
+    },
+    {
+      "epoch": 0.010897303262293951,
+      "grad_norm": 0.49125778675079346,
+      "learning_rate": 0.00019999748776403496,
+      "loss": 0.6298,
+      "step": 79
+    },
+    {
+      "epoch": 0.011035243809917925,
+      "grad_norm": 0.3789325952529907,
+      "learning_rate": 0.00019999738417791408,
+      "loss": 0.4213,
+      "step": 80
+    },
+    {
+      "epoch": 0.0111731843575419,
+      "grad_norm": 0.4855377674102783,
+      "learning_rate": 0.00019999727849918114,
+      "loss": 0.8499,
+      "step": 81
+    },
+    {
+      "epoch": 0.011311124905165873,
+      "grad_norm": 0.43113189935684204,
+      "learning_rate": 0.00019999717072783838,
+      "loss": 0.5535,
+      "step": 82
+    },
+    {
+      "epoch": 0.011449065452789847,
+      "grad_norm": 0.5266700983047485,
+      "learning_rate": 0.00019999706086388806,
+      "loss": 0.9168,
+      "step": 83
+    },
+    {
+      "epoch": 0.011587006000413822,
+      "grad_norm": 0.47261330485343933,
+      "learning_rate": 0.00019999694890733243,
+      "loss": 0.6962,
+      "step": 84
+    },
+    {
+      "epoch": 0.011724946548037796,
+      "grad_norm": 0.5425558090209961,
+      "learning_rate": 0.00019999683485817386,
+      "loss": 0.7948,
+      "step": 85
+    },
+    {
+      "epoch": 0.01186288709566177,
+      "grad_norm": 0.6287034749984741,
+      "learning_rate": 0.00019999671871641473,
+      "loss": 0.8707,
+      "step": 86
+    },
+    {
+      "epoch": 0.012000827643285744,
+      "grad_norm": 0.637447714805603,
+      "learning_rate": 0.00019999660048205747,
+      "loss": 0.9725,
+      "step": 87
+    },
+    {
+      "epoch": 0.012138768190909718,
+      "grad_norm": 0.42429324984550476,
+      "learning_rate": 0.0001999964801551046,
+      "loss": 0.7296,
+      "step": 88
+    },
+    {
+      "epoch": 0.012276708738533692,
+      "grad_norm": 0.549072265625,
+      "learning_rate": 0.00019999635773555857,
+      "loss": 0.6929,
+      "step": 89
+    },
+    {
+      "epoch": 0.012414649286157667,
+      "grad_norm": 0.4701695442199707,
+      "learning_rate": 0.000199996233223422,
+      "loss": 0.5134,
+      "step": 90
+    },
+    {
+      "epoch": 0.01255258983378164,
+      "grad_norm": 0.4564104974269867,
+      "learning_rate": 0.00019999610661869746,
+      "loss": 1.08,
+      "step": 91
+    },
+    {
+      "epoch": 0.012690530381405614,
+      "grad_norm": 6.529526233673096,
+      "learning_rate": 0.00019999597792138757,
+      "loss": 0.851,
+      "step": 92
+    },
+    {
+      "epoch": 0.012828470929029587,
+      "grad_norm": 0.5705176591873169,
+      "learning_rate": 0.00019999584713149512,
+      "loss": 0.8064,
+      "step": 93
+    },
+    {
+      "epoch": 0.012966411476653563,
+      "grad_norm": 1.0852056741714478,
+      "learning_rate": 0.00019999571424902276,
+      "loss": 1.0844,
+      "step": 94
+    },
+    {
+      "epoch": 0.013104352024277536,
+      "grad_norm": 0.4061054587364197,
+      "learning_rate": 0.00019999557927397328,
+      "loss": 0.5822,
+      "step": 95
+    },
+    {
+      "epoch": 0.01324229257190151,
+      "grad_norm": 0.560861349105835,
+      "learning_rate": 0.00019999544220634954,
+      "loss": 0.7624,
+      "step": 96
+    },
+    {
+      "epoch": 0.013380233119525485,
+      "grad_norm": 0.4556518793106079,
+      "learning_rate": 0.00019999530304615437,
+      "loss": 0.4449,
+      "step": 97
+    },
+    {
+      "epoch": 0.013518173667149459,
+      "grad_norm": 0.5117940902709961,
+      "learning_rate": 0.00019999516179339076,
+      "loss": 0.5312,
+      "step": 98
+    },
+    {
+      "epoch": 0.013656114214773432,
+      "grad_norm": 0.4420356750488281,
+      "learning_rate": 0.00019999501844806153,
+      "loss": 0.7041,
+      "step": 99
+    },
+    {
+      "epoch": 0.013794054762397407,
+      "grad_norm": 0.46731603145599365,
+      "learning_rate": 0.00019999487301016982,
+      "loss": 0.346,
+      "step": 100
+    },
+    {
+      "epoch": 0.013931995310021381,
+      "grad_norm": 0.4286811053752899,
+      "learning_rate": 0.0001999947254797186,
+      "loss": 0.5261,
+      "step": 101
+    },
+    {
+      "epoch": 0.014069935857645354,
+      "grad_norm": 0.5022907853126526,
+      "learning_rate": 0.00019999457585671095,
+      "loss": 0.4708,
+      "step": 102
+    },
+    {
+      "epoch": 0.01420787640526933,
+      "grad_norm": 0.6970852017402649,
+      "learning_rate": 0.00019999442414115004,
+      "loss": 0.823,
+      "step": 103
+    },
+    {
+      "epoch": 0.014345816952893303,
+      "grad_norm": 0.38055419921875,
+      "learning_rate": 0.000199994270333039,
+      "loss": 0.5039,
+      "step": 104
+    },
+    {
+      "epoch": 0.014483757500517277,
+      "grad_norm": 0.4658089578151703,
+      "learning_rate": 0.0001999941144323811,
+      "loss": 0.6424,
+      "step": 105
+    },
+    {
+      "epoch": 0.01462169804814125,
+      "grad_norm": 0.38460826873779297,
+      "learning_rate": 0.00019999395643917955,
+      "loss": 0.4712,
+      "step": 106
+    },
+    {
+      "epoch": 0.014759638595765226,
+      "grad_norm": 0.4541981518268585,
+      "learning_rate": 0.0001999937963534377,
+      "loss": 0.6932,
+      "step": 107
+    },
+    {
+      "epoch": 0.0148975791433892,
+      "grad_norm": 0.6719677448272705,
+      "learning_rate": 0.00019999363417515887,
+      "loss": 0.4079,
+      "step": 108
+    },
+    {
+      "epoch": 0.015035519691013173,
+      "grad_norm": 0.9907113313674927,
+      "learning_rate": 0.0001999934699043465,
+      "loss": 0.8976,
+      "step": 109
+    },
+    {
+      "epoch": 0.015173460238637148,
+      "grad_norm": 0.49312669038772583,
+      "learning_rate": 0.00019999330354100397,
+      "loss": 0.7548,
+      "step": 110
+    },
+    {
+      "epoch": 0.015311400786261121,
+      "grad_norm": 0.718002200126648,
+      "learning_rate": 0.0001999931350851348,
+      "loss": 1.2318,
+      "step": 111
+    },
+    {
+      "epoch": 0.015449341333885095,
+      "grad_norm": 0.4942666292190552,
+      "learning_rate": 0.0001999929645367425,
+      "loss": 0.6249,
+      "step": 112
+    },
+    {
+      "epoch": 0.01558728188150907,
+      "grad_norm": 0.6063429117202759,
+      "learning_rate": 0.0001999927918958306,
+      "loss": 0.6771,
+      "step": 113
+    },
+    {
+      "epoch": 0.015725222429133042,
+      "grad_norm": 0.41272974014282227,
+      "learning_rate": 0.0001999926171624028,
+      "loss": 0.4236,
+      "step": 114
+    },
+    {
+      "epoch": 0.01586316297675702,
+      "grad_norm": 0.4982720911502838,
+      "learning_rate": 0.0001999924403364627,
+      "loss": 0.774,
+      "step": 115
+    },
+    {
+      "epoch": 0.016001103524380993,
+      "grad_norm": 0.7102062106132507,
+      "learning_rate": 0.00019999226141801402,
+      "loss": 0.6597,
+      "step": 116
+    },
+    {
+      "epoch": 0.016139044072004966,
+      "grad_norm": 0.6146669983863831,
+      "learning_rate": 0.00019999208040706048,
+      "loss": 0.7579,
+      "step": 117
+    },
+    {
+      "epoch": 0.01627698461962894,
+      "grad_norm": 0.39095568656921387,
+      "learning_rate": 0.00019999189730360585,
+      "loss": 0.3573,
+      "step": 118
+    },
+    {
+      "epoch": 0.016414925167252913,
+      "grad_norm": 0.510899007320404,
+      "learning_rate": 0.00019999171210765404,
+      "loss": 0.7301,
+      "step": 119
+    },
+    {
+      "epoch": 0.016552865714876887,
+      "grad_norm": 0.7055239677429199,
+      "learning_rate": 0.00019999152481920887,
+      "loss": 0.8274,
+      "step": 120
+    },
+    {
+      "epoch": 0.016690806262500864,
+      "grad_norm": 0.5454011559486389,
+      "learning_rate": 0.00019999133543827427,
+      "loss": 0.6872,
+      "step": 121
+    },
+    {
+      "epoch": 0.016828746810124837,
+      "grad_norm": 0.889281690120697,
+      "learning_rate": 0.0001999911439648542,
+      "loss": 0.7992,
+      "step": 122
+    },
+    {
+      "epoch": 0.01696668735774881,
+      "grad_norm": 0.5498670339584351,
+      "learning_rate": 0.00019999095039895267,
+      "loss": 0.7759,
+      "step": 123
+    },
+    {
+      "epoch": 0.017104627905372784,
+      "grad_norm": 0.48078951239585876,
+      "learning_rate": 0.0001999907547405737,
+      "loss": 0.5658,
+      "step": 124
+    },
+    {
+      "epoch": 0.017242568452996758,
+      "grad_norm": 0.48576879501342773,
+      "learning_rate": 0.00019999055698972145,
+      "loss": 0.6504,
+      "step": 125
+    },
+    {
+      "epoch": 0.01738050900062073,
+      "grad_norm": 0.4521864652633667,
+      "learning_rate": 0.0001999903571464,
+      "loss": 0.5856,
+      "step": 126
+    },
+    {
+      "epoch": 0.017518449548244705,
+      "grad_norm": 0.6498937606811523,
+      "learning_rate": 0.00019999015521061358,
+      "loss": 1.0937,
+      "step": 127
+    },
+    {
+      "epoch": 0.017656390095868682,
+      "grad_norm": 0.5598773956298828,
+      "learning_rate": 0.00019998995118236638,
+      "loss": 0.4092,
+      "step": 128
+    },
+    {
+      "epoch": 0.017794330643492656,
+      "grad_norm": 0.5167121887207031,
+      "learning_rate": 0.00019998974506166265,
+      "loss": 0.5908,
+      "step": 129
+    },
+    {
+      "epoch": 0.01793227119111663,
+      "grad_norm": 0.4230504333972931,
+      "learning_rate": 0.00019998953684850678,
+      "loss": 0.5165,
+      "step": 130
+    },
+    {
+      "epoch": 0.018070211738740603,
+      "grad_norm": 0.4121663570404053,
+      "learning_rate": 0.00019998932654290307,
+      "loss": 0.573,
+      "step": 131
+    },
+    {
+      "epoch": 0.018208152286364576,
+      "grad_norm": 0.8193333745002747,
+      "learning_rate": 0.0001999891141448559,
+      "loss": 0.3659,
+      "step": 132
+    },
+    {
+      "epoch": 0.01834609283398855,
+      "grad_norm": 0.5470033288002014,
+      "learning_rate": 0.00019998889965436978,
+      "loss": 0.4961,
+      "step": 133
+    },
+    {
+      "epoch": 0.018484033381612527,
+      "grad_norm": 0.43804946541786194,
+      "learning_rate": 0.00019998868307144913,
+      "loss": 0.511,
+      "step": 134
+    },
+    {
+      "epoch": 0.0186219739292365,
+      "grad_norm": 0.5354853272438049,
+      "learning_rate": 0.00019998846439609852,
+      "loss": 0.8857,
+      "step": 135
+    },
+    {
+      "epoch": 0.018759914476860474,
+      "grad_norm": 0.44438180327415466,
+      "learning_rate": 0.00019998824362832255,
+      "loss": 0.7116,
+      "step": 136
+    },
+    {
+      "epoch": 0.018897855024484447,
+      "grad_norm": 0.41642293334007263,
+      "learning_rate": 0.0001999880207681258,
+      "loss": 0.794,
+      "step": 137
+    },
+    {
+      "epoch": 0.01903579557210842,
+      "grad_norm": 0.6521027088165283,
+      "learning_rate": 0.00019998779581551296,
+      "loss": 0.7167,
+      "step": 138
+    },
+    {
+      "epoch": 0.019173736119732394,
+      "grad_norm": 0.45690736174583435,
+      "learning_rate": 0.0001999875687704887,
+      "loss": 0.4207,
+      "step": 139
+    },
+    {
+      "epoch": 0.019311676667356368,
+      "grad_norm": 0.7973712682723999,
+      "learning_rate": 0.00019998733963305784,
+      "loss": 0.7225,
+      "step": 140
+    },
+    {
+      "epoch": 0.019449617214980345,
+      "grad_norm": 0.5635350346565247,
+      "learning_rate": 0.0001999871084032251,
+      "loss": 0.4286,
+      "step": 141
+    },
+    {
+      "epoch": 0.01958755776260432,
+      "grad_norm": 0.8758499026298523,
+      "learning_rate": 0.00019998687508099536,
+      "loss": 0.4271,
+      "step": 142
+    },
+    {
+      "epoch": 0.019725498310228292,
+      "grad_norm": 0.4160282611846924,
+      "learning_rate": 0.00019998663966637348,
+      "loss": 0.6697,
+      "step": 143
+    },
+    {
+      "epoch": 0.019863438857852266,
+      "grad_norm": 1.2644795179367065,
+      "learning_rate": 0.00019998640215936438,
+      "loss": 0.6819,
+      "step": 144
+    },
+    {
+      "epoch": 0.02000137940547624,
+      "grad_norm": 0.9141055941581726,
+      "learning_rate": 0.00019998616255997308,
+      "loss": 0.5806,
+      "step": 145
+    },
+    {
+      "epoch": 0.020139319953100213,
+      "grad_norm": 0.7500741481781006,
+      "learning_rate": 0.00019998592086820457,
+      "loss": 0.5593,
+      "step": 146
+    },
+    {
+      "epoch": 0.020277260500724186,
+      "grad_norm": 0.49818581342697144,
+      "learning_rate": 0.00019998567708406388,
+      "loss": 0.5216,
+      "step": 147
+    },
+    {
+      "epoch": 0.020415201048348163,
+      "grad_norm": 0.47777387499809265,
+      "learning_rate": 0.00019998543120755612,
+      "loss": 0.8662,
+      "step": 148
+    },
+    {
+      "epoch": 0.020553141595972137,
+      "grad_norm": 0.5563037991523743,
+      "learning_rate": 0.00019998518323868648,
+      "loss": 0.7386,
+      "step": 149
+    },
+    {
+      "epoch": 0.02069108214359611,
+      "grad_norm": 0.44376295804977417,
+      "learning_rate": 0.0001999849331774601,
+      "loss": 0.8587,
+      "step": 150
+    },
+    {
+      "epoch": 0.020829022691220084,
+      "grad_norm": 0.7179962396621704,
+      "learning_rate": 0.00019998468102388223,
+      "loss": 0.7091,
+      "step": 151
+    },
+    {
+      "epoch": 0.020966963238844057,
+      "grad_norm": 0.48467734456062317,
+      "learning_rate": 0.00019998442677795814,
+      "loss": 0.6801,
+      "step": 152
+    },
+    {
+      "epoch": 0.02110490378646803,
+      "grad_norm": 0.4698256254196167,
+      "learning_rate": 0.00019998417043969318,
+      "loss": 0.3904,
+      "step": 153
+    },
+    {
+      "epoch": 0.021242844334092008,
+      "grad_norm": 0.5565569400787354,
+      "learning_rate": 0.0001999839120090927,
+      "loss": 0.708,
+      "step": 154
+    },
+    {
+      "epoch": 0.02138078488171598,
+      "grad_norm": 0.50385582447052,
+      "learning_rate": 0.00019998365148616201,
+      "loss": 0.6504,
+      "step": 155
+    },
+    {
+      "epoch": 0.021518725429339955,
+      "grad_norm": 0.7868030667304993,
+      "learning_rate": 0.00019998338887090676,
+      "loss": 1.2273,
+      "step": 156
+    },
+    {
+      "epoch": 0.02165666597696393,
+      "grad_norm": 0.5153452754020691,
+      "learning_rate": 0.00019998312416333227,
+      "loss": 1.1071,
+      "step": 157
+    },
+    {
+      "epoch": 0.021794606524587902,
+      "grad_norm": 0.41460031270980835,
+      "learning_rate": 0.00019998285736344418,
+      "loss": 0.4554,
+      "step": 158
+    },
+    {
+      "epoch": 0.021932547072211876,
+      "grad_norm": 0.3878832757472992,
+      "learning_rate": 0.00019998258847124802,
+      "loss": 0.3698,
+      "step": 159
+    },
+    {
+      "epoch": 0.02207048761983585,
+      "grad_norm": 0.5454118251800537,
+      "learning_rate": 0.00019998231748674948,
+      "loss": 0.729,
+      "step": 160
+    },
+    {
+      "epoch": 0.022208428167459826,
+      "grad_norm": 0.46344706416130066,
+      "learning_rate": 0.00019998204440995415,
+      "loss": 0.7578,
+      "step": 161
+    },
+    {
+      "epoch": 0.0223463687150838,
+      "grad_norm": 1.1156365871429443,
+      "learning_rate": 0.0001999817692408678,
+      "loss": 0.8657,
+      "step": 162
+    },
+    {
+      "epoch": 0.022484309262707773,
+      "grad_norm": 0.44892409443855286,
+      "learning_rate": 0.00019998149197949613,
+      "loss": 0.454,
+      "step": 163
+    },
+    {
+      "epoch": 0.022622249810331747,
+      "grad_norm": 0.4572855532169342,
+      "learning_rate": 0.00019998121262584503,
+      "loss": 0.7067,
+      "step": 164
+    },
+    {
+      "epoch": 0.02276019035795572,
+      "grad_norm": 0.6114012002944946,
+      "learning_rate": 0.00019998093117992025,
+      "loss": 0.5086,
+      "step": 165
+    },
+    {
+      "epoch": 0.022898130905579694,
+      "grad_norm": 0.4878060221672058,
+      "learning_rate": 0.00019998064764172778,
+      "loss": 0.613,
+      "step": 166
+    },
+    {
+      "epoch": 0.02303607145320367,
+      "grad_norm": 1.0030025243759155,
+      "learning_rate": 0.00019998036201127346,
+      "loss": 0.9149,
+      "step": 167
+    },
+    {
+      "epoch": 0.023174012000827644,
+      "grad_norm": 0.7854377031326294,
+      "learning_rate": 0.00019998007428856336,
+      "loss": 0.2438,
+      "step": 168
+    },
+    {
+      "epoch": 0.023311952548451618,
+      "grad_norm": 0.49808233976364136,
+      "learning_rate": 0.0001999797844736034,
+      "loss": 0.6435,
+      "step": 169
+    },
+    {
+      "epoch": 0.02344989309607559,
+      "grad_norm": 0.46107861399650574,
+      "learning_rate": 0.00019997949256639973,
+      "loss": 0.695,
+      "step": 170
+    },
+    {
+      "epoch": 0.023587833643699565,
+      "grad_norm": 1.883959412574768,
+      "learning_rate": 0.0001999791985669584,
+      "loss": 1.269,
+      "step": 171
+    },
+    {
+      "epoch": 0.02372577419132354,
+      "grad_norm": 0.9229387044906616,
+      "learning_rate": 0.0001999789024752856,
+      "loss": 0.5674,
+      "step": 172
+    },
+    {
+      "epoch": 0.023863714738947512,
+      "grad_norm": 0.559683084487915,
+      "learning_rate": 0.0001999786042913875,
+      "loss": 0.6982,
+      "step": 173
+    },
+    {
+      "epoch": 0.02400165528657149,
+      "grad_norm": 0.7660646438598633,
+      "learning_rate": 0.00019997830401527033,
+      "loss": 0.4579,
+      "step": 174
+    },
+    {
+      "epoch": 0.024139595834195463,
+      "grad_norm": 1.571911096572876,
+      "learning_rate": 0.00019997800164694044,
+      "loss": 1.0384,
+      "step": 175
+    },
+    {
+      "epoch": 0.024277536381819436,
+      "grad_norm": 0.5258024334907532,
+      "learning_rate": 0.00019997769718640412,
+      "loss": 0.4984,
+      "step": 176
+    },
+    {
+      "epoch": 0.02441547692944341,
+      "grad_norm": 0.40779802203178406,
+      "learning_rate": 0.0001999773906336677,
+      "loss": 0.3694,
+      "step": 177
+    },
+    {
+      "epoch": 0.024553417477067383,
+      "grad_norm": 0.41210711002349854,
+      "learning_rate": 0.00019997708198873763,
+      "loss": 0.5131,
+      "step": 178
+    },
+    {
+      "epoch": 0.024691358024691357,
+      "grad_norm": 0.6212690472602844,
+      "learning_rate": 0.0001999767712516204,
+      "loss": 1.1256,
+      "step": 179
+    },
+    {
+      "epoch": 0.024829298572315334,
+      "grad_norm": 0.5157824754714966,
+      "learning_rate": 0.00019997645842232244,
+      "loss": 0.4837,
+      "step": 180
+    },
+    {
+      "epoch": 0.024967239119939307,
+      "grad_norm": 0.5376565456390381,
+      "learning_rate": 0.0001999761435008504,
+      "loss": 0.8027,
+      "step": 181
+    },
+    {
+      "epoch": 0.02510517966756328,
+      "grad_norm": 0.6731228828430176,
+      "learning_rate": 0.00019997582648721075,
+      "loss": 0.9505,
+      "step": 182
+    },
+    {
+      "epoch": 0.025243120215187254,
+      "grad_norm": 0.7087239623069763,
+      "learning_rate": 0.0001999755073814102,
+      "loss": 0.8651,
+      "step": 183
+    },
+    {
+      "epoch": 0.025381060762811228,
+      "grad_norm": 0.6519793272018433,
+      "learning_rate": 0.00019997518618345542,
+      "loss": 1.0137,
+      "step": 184
+    },
+    {
+      "epoch": 0.0255190013104352,
+      "grad_norm": 0.5288932919502258,
+      "learning_rate": 0.0001999748628933531,
+      "loss": 0.8138,
+      "step": 185
+    },
+    {
+      "epoch": 0.025656941858059175,
+      "grad_norm": 0.702359139919281,
+      "learning_rate": 0.00019997453751111006,
+      "loss": 0.7999,
+      "step": 186
+    },
+    {
+      "epoch": 0.025794882405683152,
+      "grad_norm": 0.512363076210022,
+      "learning_rate": 0.00019997421003673305,
+      "loss": 0.8201,
+      "step": 187
+    },
+    {
+      "epoch": 0.025932822953307125,
+      "grad_norm": 0.5105271935462952,
+      "learning_rate": 0.00019997388047022897,
+      "loss": 0.3588,
+      "step": 188
+    },
+    {
+      "epoch": 0.0260707635009311,
+      "grad_norm": 0.5415822863578796,
+      "learning_rate": 0.0001999735488116047,
+      "loss": 0.4008,
+      "step": 189
+    },
+    {
+      "epoch": 0.026208704048555072,
+      "grad_norm": 0.7703432440757751,
+      "learning_rate": 0.00019997321506086714,
+      "loss": 1.1452,
+      "step": 190
+    },
+    {
+      "epoch": 0.026346644596179046,
+      "grad_norm": 0.46029290556907654,
+      "learning_rate": 0.0001999728792180233,
+      "loss": 0.5812,
+      "step": 191
+    },
+    {
+      "epoch": 0.02648458514380302,
+      "grad_norm": 0.5847637057304382,
+      "learning_rate": 0.0001999725412830803,
+      "loss": 0.5187,
+      "step": 192
+    },
+    {
+      "epoch": 0.026622525691426997,
+      "grad_norm": 0.4251965284347534,
+      "learning_rate": 0.0001999722012560451,
+      "loss": 0.5953,
+      "step": 193
+    },
+    {
+      "epoch": 0.02676046623905097,
+      "grad_norm": 0.38125622272491455,
+      "learning_rate": 0.0001999718591369248,
+      "loss": 0.3066,
+      "step": 194
+    },
+    {
+      "epoch": 0.026898406786674944,
+      "grad_norm": 0.384884774684906,
+      "learning_rate": 0.00019997151492572664,
+      "loss": 0.6125,
+      "step": 195
+    },
+    {
+      "epoch": 0.027036347334298917,
+      "grad_norm": 0.499237596988678,
+      "learning_rate": 0.00019997116862245778,
+      "loss": 0.7623,
+      "step": 196
+    },
+    {
+      "epoch": 0.02717428788192289,
+      "grad_norm": 0.3985907733440399,
+      "learning_rate": 0.0001999708202271255,
+      "loss": 0.4129,
+      "step": 197
+    },
+    {
+      "epoch": 0.027312228429546864,
+      "grad_norm": 0.5830117464065552,
+      "learning_rate": 0.00019997046973973704,
+      "loss": 0.7606,
+      "step": 198
+    },
+    {
+      "epoch": 0.027450168977170838,
+      "grad_norm": 0.43591436743736267,
+      "learning_rate": 0.00019997011716029977,
+      "loss": 0.7417,
+      "step": 199
+    },
+    {
+      "epoch": 0.027588109524794815,
+      "grad_norm": 0.46609047055244446,
+      "learning_rate": 0.00019996976248882103,
+      "loss": 0.5169,
+      "step": 200
+    },
+    {
+      "epoch": 0.027588109524794815,
+      "eval_loss": 0.8889594674110413,
+      "eval_runtime": 23.752,
+      "eval_samples_per_second": 2.484,
+      "eval_steps_per_second": 2.484,
+      "step": 200
+    }
+  ],
+  "logging_steps": 1,
+  "max_steps": 21747,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 3,
+  "save_steps": 200,
+  "stateful_callbacks": {
+    "EarlyStoppingCallback": {
+      "args": {
+        "early_stopping_patience": 3,
+        "early_stopping_threshold": 0.0
+      },
+      "attributes": {
+        "early_stopping_patience_counter": 0
+      }
+    },
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 3.172390174457856e+16,
+  "train_batch_size": 1,
+  "trial_name": null,
+  "trial_params": null
+}

last-checkpoint/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d88a19fc804f51d0cd9e1bede1d40c25b6d35ae1db2f957334a38f2da6c58457
+size 6776