Training in progress, step 246, checkpoint

Browse files

Files changed (13) hide show

last-checkpoint/README.md +202 -0
last-checkpoint/adapter_config.json +34 -0
last-checkpoint/adapter_model.safetensors +3 -0
last-checkpoint/added_tokens.json +4 -0
last-checkpoint/optimizer.pt +3 -0
last-checkpoint/rng_state.pth +3 -0
last-checkpoint/scheduler.pt +3 -0
last-checkpoint/special_tokens_map.json +30 -0
last-checkpoint/tokenizer.json +0 -0
last-checkpoint/tokenizer.model +3 -0
last-checkpoint/tokenizer_config.json +62 -0
last-checkpoint/trainer_state.json +1755 -0
last-checkpoint/training_args.bin +3 -0

last-checkpoint/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: dltjdgh0928/test_instruction
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.13.2

last-checkpoint/adapter_config.json ADDED Viewed

	@@ -0,0 +1,34 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "dltjdgh0928/test_instruction",
+  "bias": "none",
+  "fan_in_fan_out": null,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "q_proj",
+    "down_proj",
+    "up_proj",
+    "k_proj",
+    "o_proj",
+    "v_proj",
+    "gate_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

last-checkpoint/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:04086e3372358cdde09289323d8bb4688853231d1bb420e64d093fcb39ea8c7d
+size 83945296

last-checkpoint/added_tokens.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+  "<|im_end|>": 32000,
+  "<|im_start|>": 32001
+}

last-checkpoint/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:543647e7f59f7d1ba32270b144eeb70f649c721b88b7810a69c8a6448f1e3aae
+size 43122580

last-checkpoint/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:88962f7a468f135f3fc4c54bd67b9827454474c0a80e417143aea327b7f21a13
+size 14244

last-checkpoint/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4317e92d4da91645725c2b965ebd87d685d970c35ca569b60e7b346137168906
+size 1064

last-checkpoint/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,30 @@

+{
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

last-checkpoint/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

last-checkpoint/tokenizer.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:dadfd56d766715c61d2ef780a525ab43b8e6da4de6865bda3d95fdef5e134055
+size 493443

last-checkpoint/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,62 @@

+{
+  "add_bos_token": true,
+  "add_eos_token": false,
+  "add_prefix_space": true,
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32000": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32001": {
+      "content": "<|im_start|>",
+      "lstrip": true,
+      "normalized": false,
+      "rstrip": true,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "additional_special_tokens": [],
+  "bos_token": "<s>",
+  "chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}{% endif %}",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "</s>",
+  "legacy": true,
+  "model_max_length": 1000000000000000019884624838656,
+  "pad_token": "</s>",
+  "sp_model_kwargs": {},
+  "spaces_between_special_tokens": false,
+  "tokenizer_class": "LlamaTokenizer",
+  "trust_remote_code": false,
+  "unk_token": "<unk>",
+  "use_default_system_prompt": true,
+  "use_fast": true
+}

last-checkpoint/trainer_state.json ADDED Viewed

	@@ -0,0 +1,1755 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 0.2504454059557139,
+  "eval_steps": 500,
+  "global_step": 246,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.0010180707559175363,
+      "grad_norm": 13.50528621673584,
+      "learning_rate": 2e-05,
+      "loss": 5.1969,
+      "step": 1
+    },
+    {
+      "epoch": 0.0020361415118350726,
+      "grad_norm": 19.771379470825195,
+      "learning_rate": 4e-05,
+      "loss": 7.7524,
+      "step": 2
+    },
+    {
+      "epoch": 0.003054212267752609,
+      "grad_norm": 19.734294891357422,
+      "learning_rate": 6e-05,
+      "loss": 8.0409,
+      "step": 3
+    },
+    {
+      "epoch": 0.004072283023670145,
+      "grad_norm": 24.712677001953125,
+      "learning_rate": 8e-05,
+      "loss": 7.7622,
+      "step": 4
+    },
+    {
+      "epoch": 0.0050903537795876815,
+      "grad_norm": 26.32750701904297,
+      "learning_rate": 0.0001,
+      "loss": 9.8505,
+      "step": 5
+    },
+    {
+      "epoch": 0.006108424535505218,
+      "grad_norm": 29.443405151367188,
+      "learning_rate": 9.999974203447433e-05,
+      "loss": 9.1155,
+      "step": 6
+    },
+    {
+      "epoch": 0.007126495291422754,
+      "grad_norm": 22.42616844177246,
+      "learning_rate": 9.999896814055916e-05,
+      "loss": 6.3676,
+      "step": 7
+    },
+    {
+      "epoch": 0.00814456604734029,
+      "grad_norm": 27.119375228881836,
+      "learning_rate": 9.999767832624001e-05,
+      "loss": 7.4125,
+      "step": 8
+    },
+    {
+      "epoch": 0.009162636803257827,
+      "grad_norm": 15.422072410583496,
+      "learning_rate": 9.999587260482597e-05,
+      "loss": 3.5393,
+      "step": 9
+    },
+    {
+      "epoch": 0.010180707559175363,
+      "grad_norm": 13.826335906982422,
+      "learning_rate": 9.999355099494962e-05,
+      "loss": 2.7136,
+      "step": 10
+    },
+    {
+      "epoch": 0.0111987783150929,
+      "grad_norm": 14.258417129516602,
+      "learning_rate": 9.999071352056675e-05,
+      "loss": 2.6158,
+      "step": 11
+    },
+    {
+      "epoch": 0.012216849071010435,
+      "grad_norm": 14.128898620605469,
+      "learning_rate": 9.99873602109562e-05,
+      "loss": 3.0587,
+      "step": 12
+    },
+    {
+      "epoch": 0.013234919826927972,
+      "grad_norm": 12.319880485534668,
+      "learning_rate": 9.998349110071949e-05,
+      "loss": 2.6488,
+      "step": 13
+    },
+    {
+      "epoch": 0.014252990582845508,
+      "grad_norm": 15.33985424041748,
+      "learning_rate": 9.99791062297805e-05,
+      "loss": 3.1476,
+      "step": 14
+    },
+    {
+      "epoch": 0.015271061338763044,
+      "grad_norm": 13.602853775024414,
+      "learning_rate": 9.99742056433851e-05,
+      "loss": 2.796,
+      "step": 15
+    },
+    {
+      "epoch": 0.01628913209468058,
+      "grad_norm": 9.083526611328125,
+      "learning_rate": 9.996878939210049e-05,
+      "loss": 2.0671,
+      "step": 16
+    },
+    {
+      "epoch": 0.017307202850598117,
+      "grad_norm": 10.980738639831543,
+      "learning_rate": 9.9962857531815e-05,
+      "loss": 2.629,
+      "step": 17
+    },
+    {
+      "epoch": 0.018325273606515653,
+      "grad_norm": 11.885966300964355,
+      "learning_rate": 9.99564101237372e-05,
+      "loss": 3.1609,
+      "step": 18
+    },
+    {
+      "epoch": 0.01934334436243319,
+      "grad_norm": 11.95373249053955,
+      "learning_rate": 9.994944723439546e-05,
+      "loss": 3.2291,
+      "step": 19
+    },
+    {
+      "epoch": 0.020361415118350726,
+      "grad_norm": 10.973458290100098,
+      "learning_rate": 9.994196893563721e-05,
+      "loss": 2.7778,
+      "step": 20
+    },
+    {
+      "epoch": 0.021379485874268262,
+      "grad_norm": 12.775362014770508,
+      "learning_rate": 9.993397530462818e-05,
+      "loss": 3.136,
+      "step": 21
+    },
+    {
+      "epoch": 0.0223975566301858,
+      "grad_norm": 10.916693687438965,
+      "learning_rate": 9.992546642385158e-05,
+      "loss": 2.3531,
+      "step": 22
+    },
+    {
+      "epoch": 0.023415627386103335,
+      "grad_norm": 12.338353157043457,
+      "learning_rate": 9.99164423811074e-05,
+      "loss": 3.6968,
+      "step": 23
+    },
+    {
+      "epoch": 0.02443369814202087,
+      "grad_norm": 13.168731689453125,
+      "learning_rate": 9.990690326951126e-05,
+      "loss": 3.0163,
+      "step": 24
+    },
+    {
+      "epoch": 0.025451768897938407,
+      "grad_norm": 11.88056755065918,
+      "learning_rate": 9.989684918749365e-05,
+      "loss": 3.0201,
+      "step": 25
+    },
+    {
+      "epoch": 0.026469839653855944,
+      "grad_norm": 9.301902770996094,
+      "learning_rate": 9.988628023879883e-05,
+      "loss": 2.6577,
+      "step": 26
+    },
+    {
+      "epoch": 0.02748791040977348,
+      "grad_norm": 10.02252197265625,
+      "learning_rate": 9.987519653248378e-05,
+      "loss": 2.4519,
+      "step": 27
+    },
+    {
+      "epoch": 0.028505981165691016,
+      "grad_norm": 11.7409029006958,
+      "learning_rate": 9.986359818291706e-05,
+      "loss": 3.1898,
+      "step": 28
+    },
+    {
+      "epoch": 0.029524051921608552,
+      "grad_norm": 10.492666244506836,
+      "learning_rate": 9.985148530977767e-05,
+      "loss": 2.7226,
+      "step": 29
+    },
+    {
+      "epoch": 0.03054212267752609,
+      "grad_norm": 12.711854934692383,
+      "learning_rate": 9.983885803805372e-05,
+      "loss": 2.7205,
+      "step": 30
+    },
+    {
+      "epoch": 0.03156019343344362,
+      "grad_norm": 12.937368392944336,
+      "learning_rate": 9.982571649804126e-05,
+      "loss": 3.204,
+      "step": 31
+    },
+    {
+      "epoch": 0.03257826418936116,
+      "grad_norm": 13.292618751525879,
+      "learning_rate": 9.981206082534286e-05,
+      "loss": 3.4911,
+      "step": 32
+    },
+    {
+      "epoch": 0.033596334945278694,
+      "grad_norm": 9.724308967590332,
+      "learning_rate": 9.979789116086625e-05,
+      "loss": 2.8603,
+      "step": 33
+    },
+    {
+      "epoch": 0.034614405701196234,
+      "grad_norm": 8.932650566101074,
+      "learning_rate": 9.978320765082278e-05,
+      "loss": 2.465,
+      "step": 34
+    },
+    {
+      "epoch": 0.03563247645711377,
+      "grad_norm": 10.732564926147461,
+      "learning_rate": 9.976801044672608e-05,
+      "loss": 2.9517,
+      "step": 35
+    },
+    {
+      "epoch": 0.036650547213031306,
+      "grad_norm": 11.971003532409668,
+      "learning_rate": 9.97522997053903e-05,
+      "loss": 3.2085,
+      "step": 36
+    },
+    {
+      "epoch": 0.03766861796894884,
+      "grad_norm": 10.34869384765625,
+      "learning_rate": 9.973607558892864e-05,
+      "loss": 2.9294,
+      "step": 37
+    },
+    {
+      "epoch": 0.03868668872486638,
+      "grad_norm": 12.5684814453125,
+      "learning_rate": 9.97193382647516e-05,
+      "loss": 3.6198,
+      "step": 38
+    },
+    {
+      "epoch": 0.03970475948078391,
+      "grad_norm": 13.42013931274414,
+      "learning_rate": 9.970208790556532e-05,
+      "loss": 2.8409,
+      "step": 39
+    },
+    {
+      "epoch": 0.04072283023670145,
+      "grad_norm": 10.357503890991211,
+      "learning_rate": 9.968432468936967e-05,
+      "loss": 2.6345,
+      "step": 40
+    },
+    {
+      "epoch": 0.041740900992618984,
+      "grad_norm": 12.668771743774414,
+      "learning_rate": 9.966604879945659e-05,
+      "loss": 3.2825,
+      "step": 41
+    },
+    {
+      "epoch": 0.042758971748536524,
+      "grad_norm": 9.444086074829102,
+      "learning_rate": 9.964726042440802e-05,
+      "loss": 2.7562,
+      "step": 42
+    },
+    {
+      "epoch": 0.04377704250445406,
+      "grad_norm": 10.448949813842773,
+      "learning_rate": 9.962795975809411e-05,
+      "loss": 2.8796,
+      "step": 43
+    },
+    {
+      "epoch": 0.0447951132603716,
+      "grad_norm": 10.916976928710938,
+      "learning_rate": 9.960814699967112e-05,
+      "loss": 2.7582,
+      "step": 44
+    },
+    {
+      "epoch": 0.04581318401628913,
+      "grad_norm": 11.323358535766602,
+      "learning_rate": 9.958782235357938e-05,
+      "loss": 2.6436,
+      "step": 45
+    },
+    {
+      "epoch": 0.04683125477220667,
+      "grad_norm": 10.28471851348877,
+      "learning_rate": 9.956698602954124e-05,
+      "loss": 2.8325,
+      "step": 46
+    },
+    {
+      "epoch": 0.0478493255281242,
+      "grad_norm": 13.5079984664917,
+      "learning_rate": 9.954563824255878e-05,
+      "loss": 3.0585,
+      "step": 47
+    },
+    {
+      "epoch": 0.04886739628404174,
+      "grad_norm": 11.250194549560547,
+      "learning_rate": 9.952377921291178e-05,
+      "loss": 2.7686,
+      "step": 48
+    },
+    {
+      "epoch": 0.049885467039959275,
+      "grad_norm": 12.554705619812012,
+      "learning_rate": 9.950140916615526e-05,
+      "loss": 3.0617,
+      "step": 49
+    },
+    {
+      "epoch": 0.050903537795876815,
+      "grad_norm": 12.552079200744629,
+      "learning_rate": 9.947852833311724e-05,
+      "loss": 2.08,
+      "step": 50
+    },
+    {
+      "epoch": 0.05192160855179435,
+      "grad_norm": 11.248369216918945,
+      "learning_rate": 9.945513694989639e-05,
+      "loss": 5.133,
+      "step": 51
+    },
+    {
+      "epoch": 0.05293967930771189,
+      "grad_norm": 12.866747856140137,
+      "learning_rate": 9.943123525785952e-05,
+      "loss": 5.7232,
+      "step": 52
+    },
+    {
+      "epoch": 0.05395775006362942,
+      "grad_norm": 12.395757675170898,
+      "learning_rate": 9.940682350363912e-05,
+      "loss": 4.6422,
+      "step": 53
+    },
+    {
+      "epoch": 0.05497582081954696,
+      "grad_norm": 12.23355770111084,
+      "learning_rate": 9.938190193913083e-05,
+      "loss": 4.8131,
+      "step": 54
+    },
+    {
+      "epoch": 0.05599389157546449,
+      "grad_norm": 14.62759017944336,
+      "learning_rate": 9.935647082149086e-05,
+      "loss": 6.0114,
+      "step": 55
+    },
+    {
+      "epoch": 0.05701196233138203,
+      "grad_norm": 13.613059997558594,
+      "learning_rate": 9.933053041313325e-05,
+      "loss": 4.794,
+      "step": 56
+    },
+    {
+      "epoch": 0.058030033087299565,
+      "grad_norm": 13.422719955444336,
+      "learning_rate": 9.930408098172725e-05,
+      "loss": 4.5392,
+      "step": 57
+    },
+    {
+      "epoch": 0.059048103843217105,
+      "grad_norm": 17.745412826538086,
+      "learning_rate": 9.92771228001945e-05,
+      "loss": 7.1147,
+      "step": 58
+    },
+    {
+      "epoch": 0.06006617459913464,
+      "grad_norm": 13.955183982849121,
+      "learning_rate": 9.924965614670629e-05,
+      "loss": 3.619,
+      "step": 59
+    },
+    {
+      "epoch": 0.06108424535505218,
+      "grad_norm": 11.067267417907715,
+      "learning_rate": 9.922168130468059e-05,
+      "loss": 2.6905,
+      "step": 60
+    },
+    {
+      "epoch": 0.06210231611096971,
+      "grad_norm": 11.641958236694336,
+      "learning_rate": 9.91931985627792e-05,
+      "loss": 2.398,
+      "step": 61
+    },
+    {
+      "epoch": 0.06312038686688724,
+      "grad_norm": 8.590779304504395,
+      "learning_rate": 9.916420821490472e-05,
+      "loss": 1.9248,
+      "step": 62
+    },
+    {
+      "epoch": 0.06413845762280479,
+      "grad_norm": 8.852486610412598,
+      "learning_rate": 9.91347105601976e-05,
+      "loss": 2.3876,
+      "step": 63
+    },
+    {
+      "epoch": 0.06515652837872232,
+      "grad_norm": 9.158111572265625,
+      "learning_rate": 9.910470590303293e-05,
+      "loss": 1.9339,
+      "step": 64
+    },
+    {
+      "epoch": 0.06617459913463986,
+      "grad_norm": 8.361588478088379,
+      "learning_rate": 9.907419455301741e-05,
+      "loss": 2.3266,
+      "step": 65
+    },
+    {
+      "epoch": 0.06719266989055739,
+      "grad_norm": 7.891152858734131,
+      "learning_rate": 9.904317682498608e-05,
+      "loss": 1.9775,
+      "step": 66
+    },
+    {
+      "epoch": 0.06821074064647493,
+      "grad_norm": 8.722708702087402,
+      "learning_rate": 9.901165303899916e-05,
+      "loss": 2.2988,
+      "step": 67
+    },
+    {
+      "epoch": 0.06922881140239247,
+      "grad_norm": 10.848478317260742,
+      "learning_rate": 9.897962352033861e-05,
+      "loss": 2.2087,
+      "step": 68
+    },
+    {
+      "epoch": 0.07024688215831,
+      "grad_norm": 7.828042984008789,
+      "learning_rate": 9.89470885995049e-05,
+      "loss": 2.1694,
+      "step": 69
+    },
+    {
+      "epoch": 0.07126495291422753,
+      "grad_norm": 7.928416728973389,
+      "learning_rate": 9.891404861221356e-05,
+      "loss": 1.7946,
+      "step": 70
+    },
+    {
+      "epoch": 0.07228302367014508,
+      "grad_norm": 8.273153305053711,
+      "learning_rate": 9.888050389939172e-05,
+      "loss": 2.2472,
+      "step": 71
+    },
+    {
+      "epoch": 0.07330109442606261,
+      "grad_norm": 7.866210460662842,
+      "learning_rate": 9.884645480717451e-05,
+      "loss": 1.9656,
+      "step": 72
+    },
+    {
+      "epoch": 0.07431916518198015,
+      "grad_norm": 9.140717506408691,
+      "learning_rate": 9.881190168690164e-05,
+      "loss": 2.5084,
+      "step": 73
+    },
+    {
+      "epoch": 0.07533723593789768,
+      "grad_norm": 10.078163146972656,
+      "learning_rate": 9.877684489511366e-05,
+      "loss": 2.8882,
+      "step": 74
+    },
+    {
+      "epoch": 0.07635530669381523,
+      "grad_norm": 8.583365440368652,
+      "learning_rate": 9.874128479354832e-05,
+      "loss": 2.2404,
+      "step": 75
+    },
+    {
+      "epoch": 0.07737337744973276,
+      "grad_norm": 10.980644226074219,
+      "learning_rate": 9.870522174913682e-05,
+      "loss": 2.9591,
+      "step": 76
+    },
+    {
+      "epoch": 0.07839144820565029,
+      "grad_norm": 9.829695701599121,
+      "learning_rate": 9.866865613400008e-05,
+      "loss": 2.5868,
+      "step": 77
+    },
+    {
+      "epoch": 0.07940951896156782,
+      "grad_norm": 9.993083000183105,
+      "learning_rate": 9.863158832544477e-05,
+      "loss": 2.7386,
+      "step": 78
+    },
+    {
+      "epoch": 0.08042758971748537,
+      "grad_norm": 9.227055549621582,
+      "learning_rate": 9.859401870595959e-05,
+      "loss": 2.3334,
+      "step": 79
+    },
+    {
+      "epoch": 0.0814456604734029,
+      "grad_norm": 9.135334968566895,
+      "learning_rate": 9.855594766321122e-05,
+      "loss": 2.6064,
+      "step": 80
+    },
+    {
+      "epoch": 0.08246373122932044,
+      "grad_norm": 9.216446876525879,
+      "learning_rate": 9.85173755900403e-05,
+      "loss": 2.9289,
+      "step": 81
+    },
+    {
+      "epoch": 0.08348180198523797,
+      "grad_norm": 12.71446418762207,
+      "learning_rate": 9.847830288445745e-05,
+      "loss": 3.5027,
+      "step": 82
+    },
+    {
+      "epoch": 0.08449987274115552,
+      "grad_norm": 9.071185111999512,
+      "learning_rate": 9.843872994963911e-05,
+      "loss": 3.1217,
+      "step": 83
+    },
+    {
+      "epoch": 0.08551794349707305,
+      "grad_norm": 7.825349807739258,
+      "learning_rate": 9.839865719392339e-05,
+      "loss": 2.4812,
+      "step": 84
+    },
+    {
+      "epoch": 0.08653601425299058,
+      "grad_norm": 11.979453086853027,
+      "learning_rate": 9.835808503080585e-05,
+      "loss": 3.6076,
+      "step": 85
+    },
+    {
+      "epoch": 0.08755408500890811,
+      "grad_norm": 10.889570236206055,
+      "learning_rate": 9.831701387893533e-05,
+      "loss": 3.9539,
+      "step": 86
+    },
+    {
+      "epoch": 0.08857215576482566,
+      "grad_norm": 6.638063430786133,
+      "learning_rate": 9.827544416210941e-05,
+      "loss": 2.1225,
+      "step": 87
+    },
+    {
+      "epoch": 0.0895902265207432,
+      "grad_norm": 11.630864143371582,
+      "learning_rate": 9.823337630927026e-05,
+      "loss": 2.8508,
+      "step": 88
+    },
+    {
+      "epoch": 0.09060829727666073,
+      "grad_norm": 11.906623840332031,
+      "learning_rate": 9.819081075450014e-05,
+      "loss": 3.0837,
+      "step": 89
+    },
+    {
+      "epoch": 0.09162636803257826,
+      "grad_norm": 12.019804000854492,
+      "learning_rate": 9.814774793701687e-05,
+      "loss": 3.6106,
+      "step": 90
+    },
+    {
+      "epoch": 0.0926444387884958,
+      "grad_norm": 7.91819953918457,
+      "learning_rate": 9.810418830116932e-05,
+      "loss": 2.3236,
+      "step": 91
+    },
+    {
+      "epoch": 0.09366250954441334,
+      "grad_norm": 9.185378074645996,
+      "learning_rate": 9.806013229643289e-05,
+      "loss": 2.6397,
+      "step": 92
+    },
+    {
+      "epoch": 0.09468058030033087,
+      "grad_norm": 12.451518058776855,
+      "learning_rate": 9.801558037740478e-05,
+      "loss": 3.3661,
+      "step": 93
+    },
+    {
+      "epoch": 0.0956986510562484,
+      "grad_norm": 9.665090560913086,
+      "learning_rate": 9.797053300379937e-05,
+      "loss": 2.7933,
+      "step": 94
+    },
+    {
+      "epoch": 0.09671672181216595,
+      "grad_norm": 9.512073516845703,
+      "learning_rate": 9.792499064044342e-05,
+      "loss": 3.1669,
+      "step": 95
+    },
+    {
+      "epoch": 0.09773479256808348,
+      "grad_norm": 11.063192367553711,
+      "learning_rate": 9.787895375727136e-05,
+      "loss": 2.4502,
+      "step": 96
+    },
+    {
+      "epoch": 0.09875286332400102,
+      "grad_norm": 11.608457565307617,
+      "learning_rate": 9.783242282932028e-05,
+      "loss": 2.5691,
+      "step": 97
+    },
+    {
+      "epoch": 0.09977093407991855,
+      "grad_norm": 10.834481239318848,
+      "learning_rate": 9.778539833672524e-05,
+      "loss": 2.8208,
+      "step": 98
+    },
+    {
+      "epoch": 0.1007890048358361,
+      "grad_norm": 9.476598739624023,
+      "learning_rate": 9.773788076471414e-05,
+      "loss": 2.4245,
+      "step": 99
+    },
+    {
+      "epoch": 0.10180707559175363,
+      "grad_norm": 10.453302383422852,
+      "learning_rate": 9.768987060360279e-05,
+      "loss": 2.1369,
+      "step": 100
+    },
+    {
+      "epoch": 0.10282514634767116,
+      "grad_norm": 8.380644798278809,
+      "learning_rate": 9.764136834878986e-05,
+      "loss": 4.4008,
+      "step": 101
+    },
+    {
+      "epoch": 0.1038432171035887,
+      "grad_norm": 10.45700740814209,
+      "learning_rate": 9.759237450075174e-05,
+      "loss": 3.8277,
+      "step": 102
+    },
+    {
+      "epoch": 0.10486128785950624,
+      "grad_norm": 11.106316566467285,
+      "learning_rate": 9.754288956503736e-05,
+      "loss": 4.3912,
+      "step": 103
+    },
+    {
+      "epoch": 0.10587935861542377,
+      "grad_norm": 12.727373123168945,
+      "learning_rate": 9.749291405226305e-05,
+      "loss": 5.0723,
+      "step": 104
+    },
+    {
+      "epoch": 0.10689742937134131,
+      "grad_norm": 11.3184175491333,
+      "learning_rate": 9.744244847810716e-05,
+      "loss": 4.6612,
+      "step": 105
+    },
+    {
+      "epoch": 0.10791550012725884,
+      "grad_norm": 11.49225902557373,
+      "learning_rate": 9.739149336330482e-05,
+      "loss": 5.2688,
+      "step": 106
+    },
+    {
+      "epoch": 0.10893357088317639,
+      "grad_norm": 9.92116928100586,
+      "learning_rate": 9.734004923364257e-05,
+      "loss": 3.1285,
+      "step": 107
+    },
+    {
+      "epoch": 0.10995164163909392,
+      "grad_norm": 16.322154998779297,
+      "learning_rate": 9.728811661995288e-05,
+      "loss": 4.3573,
+      "step": 108
+    },
+    {
+      "epoch": 0.11096971239501145,
+      "grad_norm": 11.590410232543945,
+      "learning_rate": 9.723569605810871e-05,
+      "loss": 3.3457,
+      "step": 109
+    },
+    {
+      "epoch": 0.11198778315092899,
+      "grad_norm": 6.267991065979004,
+      "learning_rate": 9.718278808901797e-05,
+      "loss": 1.8973,
+      "step": 110
+    },
+    {
+      "epoch": 0.11300585390684653,
+      "grad_norm": 7.807132720947266,
+      "learning_rate": 9.712939325861794e-05,
+      "loss": 2.2999,
+      "step": 111
+    },
+    {
+      "epoch": 0.11402392466276406,
+      "grad_norm": 5.800601005554199,
+      "learning_rate": 9.707551211786965e-05,
+      "loss": 1.0863,
+      "step": 112
+    },
+    {
+      "epoch": 0.1150419954186816,
+      "grad_norm": 7.150589466094971,
+      "learning_rate": 9.702114522275216e-05,
+      "loss": 1.9172,
+      "step": 113
+    },
+    {
+      "epoch": 0.11606006617459913,
+      "grad_norm": 8.134252548217773,
+      "learning_rate": 9.696629313425686e-05,
+      "loss": 2.2173,
+      "step": 114
+    },
+    {
+      "epoch": 0.11707813693051668,
+      "grad_norm": 7.6389689445495605,
+      "learning_rate": 9.691095641838169e-05,
+      "loss": 1.8046,
+      "step": 115
+    },
+    {
+      "epoch": 0.11809620768643421,
+      "grad_norm": 6.845970153808594,
+      "learning_rate": 9.685513564612521e-05,
+      "loss": 1.9059,
+      "step": 116
+    },
+    {
+      "epoch": 0.11911427844235174,
+      "grad_norm": 10.888468742370605,
+      "learning_rate": 9.679883139348082e-05,
+      "loss": 2.9148,
+      "step": 117
+    },
+    {
+      "epoch": 0.12013234919826928,
+      "grad_norm": 6.594396114349365,
+      "learning_rate": 9.674204424143078e-05,
+      "loss": 1.8292,
+      "step": 118
+    },
+    {
+      "epoch": 0.12115041995418682,
+      "grad_norm": 7.157876491546631,
+      "learning_rate": 9.66847747759402e-05,
+      "loss": 1.6858,
+      "step": 119
+    },
+    {
+      "epoch": 0.12216849071010435,
+      "grad_norm": 7.298995494842529,
+      "learning_rate": 9.662702358795098e-05,
+      "loss": 1.7957,
+      "step": 120
+    },
+    {
+      "epoch": 0.12318656146602189,
+      "grad_norm": 9.0108003616333,
+      "learning_rate": 9.656879127337571e-05,
+      "loss": 2.2843,
+      "step": 121
+    },
+    {
+      "epoch": 0.12420463222193942,
+      "grad_norm": 8.476913452148438,
+      "learning_rate": 9.651007843309163e-05,
+      "loss": 2.1026,
+      "step": 122
+    },
+    {
+      "epoch": 0.12522270297785695,
+      "grad_norm": 9.930148124694824,
+      "learning_rate": 9.645088567293426e-05,
+      "loss": 2.6976,
+      "step": 123
+    },
+    {
+      "epoch": 0.1262407737337745,
+      "grad_norm": 8.574073791503906,
+      "learning_rate": 9.639121360369126e-05,
+      "loss": 1.7768,
+      "step": 124
+    },
+    {
+      "epoch": 0.12725884448969205,
+      "grad_norm": 13.36725902557373,
+      "learning_rate": 9.63310628410961e-05,
+      "loss": 2.7559,
+      "step": 125
+    },
+    {
+      "epoch": 0.12827691524560958,
+      "grad_norm": 8.55522346496582,
+      "learning_rate": 9.627043400582172e-05,
+      "loss": 2.3419,
+      "step": 126
+    },
+    {
+      "epoch": 0.1292949860015271,
+      "grad_norm": 9.948506355285645,
+      "learning_rate": 9.620932772347408e-05,
+      "loss": 3.0092,
+      "step": 127
+    },
+    {
+      "epoch": 0.13031305675744465,
+      "grad_norm": 10.05156135559082,
+      "learning_rate": 9.614774462458573e-05,
+      "loss": 2.1554,
+      "step": 128
+    },
+    {
+      "epoch": 0.13133112751336218,
+      "grad_norm": 10.230545043945312,
+      "learning_rate": 9.608568534460936e-05,
+      "loss": 2.572,
+      "step": 129
+    },
+    {
+      "epoch": 0.1323491982692797,
+      "grad_norm": 7.820633411407471,
+      "learning_rate": 9.602315052391115e-05,
+      "loss": 2.2316,
+      "step": 130
+    },
+    {
+      "epoch": 0.13336726902519724,
+      "grad_norm": 7.196948528289795,
+      "learning_rate": 9.596014080776423e-05,
+      "loss": 2.276,
+      "step": 131
+    },
+    {
+      "epoch": 0.13438533978111478,
+      "grad_norm": 10.125378608703613,
+      "learning_rate": 9.589665684634196e-05,
+      "loss": 3.6436,
+      "step": 132
+    },
+    {
+      "epoch": 0.13540341053703234,
+      "grad_norm": 8.542695045471191,
+      "learning_rate": 9.583269929471128e-05,
+      "loss": 2.8726,
+      "step": 133
+    },
+    {
+      "epoch": 0.13642148129294987,
+      "grad_norm": 8.097149848937988,
+      "learning_rate": 9.576826881282594e-05,
+      "loss": 2.3483,
+      "step": 134
+    },
+    {
+      "epoch": 0.1374395520488674,
+      "grad_norm": 8.922883987426758,
+      "learning_rate": 9.570336606551967e-05,
+      "loss": 2.5365,
+      "step": 135
+    },
+    {
+      "epoch": 0.13845762280478494,
+      "grad_norm": 9.18602180480957,
+      "learning_rate": 9.56379917224993e-05,
+      "loss": 2.7464,
+      "step": 136
+    },
+    {
+      "epoch": 0.13947569356070247,
+      "grad_norm": 8.929719924926758,
+      "learning_rate": 9.557214645833792e-05,
+      "loss": 2.8074,
+      "step": 137
+    },
+    {
+      "epoch": 0.14049376431662,
+      "grad_norm": 10.157453536987305,
+      "learning_rate": 9.550583095246786e-05,
+      "loss": 2.6313,
+      "step": 138
+    },
+    {
+      "epoch": 0.14151183507253753,
+      "grad_norm": 8.677960395812988,
+      "learning_rate": 9.543904588917367e-05,
+      "loss": 2.7515,
+      "step": 139
+    },
+    {
+      "epoch": 0.14252990582845507,
+      "grad_norm": 8.684197425842285,
+      "learning_rate": 9.537179195758512e-05,
+      "loss": 2.5564,
+      "step": 140
+    },
+    {
+      "epoch": 0.14354797658437263,
+      "grad_norm": 8.283134460449219,
+      "learning_rate": 9.530406985167004e-05,
+      "loss": 2.3474,
+      "step": 141
+    },
+    {
+      "epoch": 0.14456604734029016,
+      "grad_norm": 7.090147018432617,
+      "learning_rate": 9.523588027022721e-05,
+      "loss": 2.0495,
+      "step": 142
+    },
+    {
+      "epoch": 0.1455841180962077,
+      "grad_norm": 9.59614086151123,
+      "learning_rate": 9.516722391687902e-05,
+      "loss": 2.4563,
+      "step": 143
+    },
+    {
+      "epoch": 0.14660218885212523,
+      "grad_norm": 7.75164270401001,
+      "learning_rate": 9.50981015000644e-05,
+      "loss": 2.0795,
+      "step": 144
+    },
+    {
+      "epoch": 0.14762025960804276,
+      "grad_norm": 9.117147445678711,
+      "learning_rate": 9.502851373303136e-05,
+      "loss": 2.519,
+      "step": 145
+    },
+    {
+      "epoch": 0.1486383303639603,
+      "grad_norm": 9.871448516845703,
+      "learning_rate": 9.495846133382973e-05,
+      "loss": 2.6371,
+      "step": 146
+    },
+    {
+      "epoch": 0.14965640111987782,
+      "grad_norm": 8.246638298034668,
+      "learning_rate": 9.488794502530362e-05,
+      "loss": 2.3142,
+      "step": 147
+    },
+    {
+      "epoch": 0.15067447187579536,
+      "grad_norm": 11.579840660095215,
+      "learning_rate": 9.48169655350841e-05,
+      "loss": 2.8947,
+      "step": 148
+    },
+    {
+      "epoch": 0.15169254263171292,
+      "grad_norm": 13.307292938232422,
+      "learning_rate": 9.474552359558166e-05,
+      "loss": 2.9942,
+      "step": 149
+    },
+    {
+      "epoch": 0.15271061338763045,
+      "grad_norm": 10.210186958312988,
+      "learning_rate": 9.467361994397859e-05,
+      "loss": 2.0216,
+      "step": 150
+    },
+    {
+      "epoch": 0.15372868414354798,
+      "grad_norm": 7.870486259460449,
+      "learning_rate": 9.460125532222141e-05,
+      "loss": 2.6203,
+      "step": 151
+    },
+    {
+      "epoch": 0.15474675489946552,
+      "grad_norm": 13.753894805908203,
+      "learning_rate": 9.452843047701323e-05,
+      "loss": 4.1998,
+      "step": 152
+    },
+    {
+      "epoch": 0.15576482565538305,
+      "grad_norm": 10.677061080932617,
+      "learning_rate": 9.445514615980604e-05,
+      "loss": 3.9647,
+      "step": 153
+    },
+    {
+      "epoch": 0.15678289641130058,
+      "grad_norm": 11.903203010559082,
+      "learning_rate": 9.438140312679291e-05,
+      "loss": 4.2215,
+      "step": 154
+    },
+    {
+      "epoch": 0.15780096716721811,
+      "grad_norm": 12.882353782653809,
+      "learning_rate": 9.43072021389003e-05,
+      "loss": 5.0153,
+      "step": 155
+    },
+    {
+      "epoch": 0.15881903792313565,
+      "grad_norm": 13.99023151397705,
+      "learning_rate": 9.423254396178003e-05,
+      "loss": 5.5362,
+      "step": 156
+    },
+    {
+      "epoch": 0.1598371086790532,
+      "grad_norm": 16.683727264404297,
+      "learning_rate": 9.415742936580157e-05,
+      "loss": 5.1149,
+      "step": 157
+    },
+    {
+      "epoch": 0.16085517943497074,
+      "grad_norm": 17.32396125793457,
+      "learning_rate": 9.408185912604394e-05,
+      "loss": 4.8563,
+      "step": 158
+    },
+    {
+      "epoch": 0.16187325019088827,
+      "grad_norm": 14.138668060302734,
+      "learning_rate": 9.400583402228784e-05,
+      "loss": 3.4698,
+      "step": 159
+    },
+    {
+      "epoch": 0.1628913209468058,
+      "grad_norm": 6.4397430419921875,
+      "learning_rate": 9.392935483900749e-05,
+      "loss": 1.8856,
+      "step": 160
+    },
+    {
+      "epoch": 0.16390939170272334,
+      "grad_norm": 4.72169303894043,
+      "learning_rate": 9.38524223653626e-05,
+      "loss": 1.3027,
+      "step": 161
+    },
+    {
+      "epoch": 0.16492746245864087,
+      "grad_norm": 7.877247333526611,
+      "learning_rate": 9.377503739519019e-05,
+      "loss": 1.9129,
+      "step": 162
+    },
+    {
+      "epoch": 0.1659455332145584,
+      "grad_norm": 8.524123191833496,
+      "learning_rate": 9.369720072699647e-05,
+      "loss": 1.5605,
+      "step": 163
+    },
+    {
+      "epoch": 0.16696360397047594,
+      "grad_norm": 9.966007232666016,
+      "learning_rate": 9.361891316394851e-05,
+      "loss": 2.5458,
+      "step": 164
+    },
+    {
+      "epoch": 0.16798167472639347,
+      "grad_norm": 9.061026573181152,
+      "learning_rate": 9.354017551386599e-05,
+      "loss": 1.8415,
+      "step": 165
+    },
+    {
+      "epoch": 0.16899974548231103,
+      "grad_norm": 7.912156581878662,
+      "learning_rate": 9.346098858921291e-05,
+      "loss": 1.9514,
+      "step": 166
+    },
+    {
+      "epoch": 0.17001781623822856,
+      "grad_norm": 6.926218509674072,
+      "learning_rate": 9.338135320708911e-05,
+      "loss": 2.1861,
+      "step": 167
+    },
+    {
+      "epoch": 0.1710358869941461,
+      "grad_norm": 7.546460151672363,
+      "learning_rate": 9.330127018922194e-05,
+      "loss": 1.7336,
+      "step": 168
+    },
+    {
+      "epoch": 0.17205395775006363,
+      "grad_norm": 6.780023097991943,
+      "learning_rate": 9.322074036195769e-05,
+      "loss": 1.766,
+      "step": 169
+    },
+    {
+      "epoch": 0.17307202850598116,
+      "grad_norm": 8.207006454467773,
+      "learning_rate": 9.313976455625315e-05,
+      "loss": 1.937,
+      "step": 170
+    },
+    {
+      "epoch": 0.1740900992618987,
+      "grad_norm": 10.892253875732422,
+      "learning_rate": 9.305834360766695e-05,
+      "loss": 2.6682,
+      "step": 171
+    },
+    {
+      "epoch": 0.17510817001781623,
+      "grad_norm": 8.318902015686035,
+      "learning_rate": 9.297647835635102e-05,
+      "loss": 2.0102,
+      "step": 172
+    },
+    {
+      "epoch": 0.17612624077373376,
+      "grad_norm": 7.727786540985107,
+      "learning_rate": 9.289416964704185e-05,
+      "loss": 1.9714,
+      "step": 173
+    },
+    {
+      "epoch": 0.17714431152965132,
+      "grad_norm": 9.250336647033691,
+      "learning_rate": 9.281141832905185e-05,
+      "loss": 2.3855,
+      "step": 174
+    },
+    {
+      "epoch": 0.17816238228556885,
+      "grad_norm": 7.347965717315674,
+      "learning_rate": 9.272822525626046e-05,
+      "loss": 1.8475,
+      "step": 175
+    },
+    {
+      "epoch": 0.1791804530414864,
+      "grad_norm": 7.1732354164123535,
+      "learning_rate": 9.26445912871055e-05,
+      "loss": 1.9938,
+      "step": 176
+    },
+    {
+      "epoch": 0.18019852379740392,
+      "grad_norm": 11.556361198425293,
+      "learning_rate": 9.25605172845742e-05,
+      "loss": 3.3699,
+      "step": 177
+    },
+    {
+      "epoch": 0.18121659455332145,
+      "grad_norm": 9.626664161682129,
+      "learning_rate": 9.247600411619434e-05,
+      "loss": 2.7054,
+      "step": 178
+    },
+    {
+      "epoch": 0.18223466530923899,
+      "grad_norm": 7.422823429107666,
+      "learning_rate": 9.239105265402525e-05,
+      "loss": 2.3665,
+      "step": 179
+    },
+    {
+      "epoch": 0.18325273606515652,
+      "grad_norm": 8.812822341918945,
+      "learning_rate": 9.23056637746489e-05,
+      "loss": 2.4336,
+      "step": 180
+    },
+    {
+      "epoch": 0.18427080682107405,
+      "grad_norm": 12.493931770324707,
+      "learning_rate": 9.221983835916074e-05,
+      "loss": 2.4446,
+      "step": 181
+    },
+    {
+      "epoch": 0.1852888775769916,
+      "grad_norm": 9.533077239990234,
+      "learning_rate": 9.213357729316076e-05,
+      "loss": 2.5195,
+      "step": 182
+    },
+    {
+      "epoch": 0.18630694833290914,
+      "grad_norm": 7.195649147033691,
+      "learning_rate": 9.204688146674418e-05,
+      "loss": 1.5695,
+      "step": 183
+    },
+    {
+      "epoch": 0.18732501908882668,
+      "grad_norm": 10.850951194763184,
+      "learning_rate": 9.195975177449238e-05,
+      "loss": 3.3308,
+      "step": 184
+    },
+    {
+      "epoch": 0.1883430898447442,
+      "grad_norm": 9.36767578125,
+      "learning_rate": 9.187218911546362e-05,
+      "loss": 2.8146,
+      "step": 185
+    },
+    {
+      "epoch": 0.18936116060066174,
+      "grad_norm": 14.791803359985352,
+      "learning_rate": 9.178419439318382e-05,
+      "loss": 3.5093,
+      "step": 186
+    },
+    {
+      "epoch": 0.19037923135657928,
+      "grad_norm": 10.107565879821777,
+      "learning_rate": 9.169576851563715e-05,
+      "loss": 2.4756,
+      "step": 187
+    },
+    {
+      "epoch": 0.1913973021124968,
+      "grad_norm": 8.8936128616333,
+      "learning_rate": 9.160691239525674e-05,
+      "loss": 2.4272,
+      "step": 188
+    },
+    {
+      "epoch": 0.19241537286841434,
+      "grad_norm": 8.861714363098145,
+      "learning_rate": 9.151762694891521e-05,
+      "loss": 2.1092,
+      "step": 189
+    },
+    {
+      "epoch": 0.1934334436243319,
+      "grad_norm": 9.74419116973877,
+      "learning_rate": 9.142791309791528e-05,
+      "loss": 3.1339,
+      "step": 190
+    },
+    {
+      "epoch": 0.19445151438024944,
+      "grad_norm": 10.207488059997559,
+      "learning_rate": 9.133777176798013e-05,
+      "loss": 2.5119,
+      "step": 191
+    },
+    {
+      "epoch": 0.19546958513616697,
+      "grad_norm": 9.463604927062988,
+      "learning_rate": 9.124720388924403e-05,
+      "loss": 2.669,
+      "step": 192
+    },
+    {
+      "epoch": 0.1964876558920845,
+      "grad_norm": 11.191435813903809,
+      "learning_rate": 9.115621039624256e-05,
+      "loss": 3.134,
+      "step": 193
+    },
+    {
+      "epoch": 0.19750572664800203,
+      "grad_norm": 8.744293212890625,
+      "learning_rate": 9.10647922279031e-05,
+      "loss": 2.8205,
+      "step": 194
+    },
+    {
+      "epoch": 0.19852379740391957,
+      "grad_norm": 9.338461875915527,
+      "learning_rate": 9.09729503275351e-05,
+      "loss": 2.2502,
+      "step": 195
+    },
+    {
+      "epoch": 0.1995418681598371,
+      "grad_norm": 8.457433700561523,
+      "learning_rate": 9.088068564282031e-05,
+      "loss": 2.1407,
+      "step": 196
+    },
+    {
+      "epoch": 0.20055993891575463,
+      "grad_norm": 11.790545463562012,
+      "learning_rate": 9.078799912580304e-05,
+      "loss": 3.0246,
+      "step": 197
+    },
+    {
+      "epoch": 0.2015780096716722,
+      "grad_norm": 10.485797882080078,
+      "learning_rate": 9.069489173288038e-05,
+      "loss": 2.7989,
+      "step": 198
+    },
+    {
+      "epoch": 0.20259608042758973,
+      "grad_norm": 10.064512252807617,
+      "learning_rate": 9.060136442479215e-05,
+      "loss": 2.3104,
+      "step": 199
+    },
+    {
+      "epoch": 0.20361415118350726,
+      "grad_norm": 11.273386001586914,
+      "learning_rate": 9.050741816661128e-05,
+      "loss": 2.1308,
+      "step": 200
+    },
+    {
+      "epoch": 0.2046322219394248,
+      "grad_norm": 7.872629642486572,
+      "learning_rate": 9.041305392773354e-05,
+      "loss": 3.2454,
+      "step": 201
+    },
+    {
+      "epoch": 0.20565029269534232,
+      "grad_norm": 10.097418785095215,
+      "learning_rate": 9.031827268186779e-05,
+      "loss": 3.8778,
+      "step": 202
+    },
+    {
+      "epoch": 0.20666836345125986,
+      "grad_norm": 9.544397354125977,
+      "learning_rate": 9.022307540702576e-05,
+      "loss": 3.5354,
+      "step": 203
+    },
+    {
+      "epoch": 0.2076864342071774,
+      "grad_norm": 13.447309494018555,
+      "learning_rate": 9.012746308551208e-05,
+      "loss": 5.3594,
+      "step": 204
+    },
+    {
+      "epoch": 0.20870450496309492,
+      "grad_norm": 12.501740455627441,
+      "learning_rate": 9.003143670391403e-05,
+      "loss": 3.5315,
+      "step": 205
+    },
+    {
+      "epoch": 0.20972257571901248,
+      "grad_norm": 13.571687698364258,
+      "learning_rate": 8.993499725309148e-05,
+      "loss": 4.0421,
+      "step": 206
+    },
+    {
+      "epoch": 0.21074064647493002,
+      "grad_norm": 14.879913330078125,
+      "learning_rate": 8.983814572816656e-05,
+      "loss": 4.1594,
+      "step": 207
+    },
+    {
+      "epoch": 0.21175871723084755,
+      "grad_norm": 17.623329162597656,
+      "learning_rate": 8.974088312851345e-05,
+      "loss": 4.9946,
+      "step": 208
+    },
+    {
+      "epoch": 0.21277678798676508,
+      "grad_norm": 6.669205665588379,
+      "learning_rate": 8.964321045774807e-05,
+      "loss": 1.5305,
+      "step": 209
+    },
+    {
+      "epoch": 0.21379485874268261,
+      "grad_norm": 9.656936645507812,
+      "learning_rate": 8.954512872371769e-05,
+      "loss": 2.7299,
+      "step": 210
+    },
+    {
+      "epoch": 0.21481292949860015,
+      "grad_norm": 7.008784770965576,
+      "learning_rate": 8.944663893849052e-05,
+      "loss": 1.4462,
+      "step": 211
+    },
+    {
+      "epoch": 0.21583100025451768,
+      "grad_norm": 6.301548004150391,
+      "learning_rate": 8.934774211834538e-05,
+      "loss": 1.4093,
+      "step": 212
+    },
+    {
+      "epoch": 0.2168490710104352,
+      "grad_norm": 7.544199466705322,
+      "learning_rate": 8.924843928376104e-05,
+      "loss": 1.6221,
+      "step": 213
+    },
+    {
+      "epoch": 0.21786714176635277,
+      "grad_norm": 9.308175086975098,
+      "learning_rate": 8.914873145940584e-05,
+      "loss": 2.1724,
+      "step": 214
+    },
+    {
+      "epoch": 0.2188852125222703,
+      "grad_norm": 8.202116012573242,
+      "learning_rate": 8.904861967412703e-05,
+      "loss": 1.7294,
+      "step": 215
+    },
+    {
+      "epoch": 0.21990328327818784,
+      "grad_norm": 9.309891700744629,
+      "learning_rate": 8.894810496094016e-05,
+      "loss": 2.1319,
+      "step": 216
+    },
+    {
+      "epoch": 0.22092135403410537,
+      "grad_norm": 7.8817925453186035,
+      "learning_rate": 8.884718835701848e-05,
+      "loss": 2.0479,
+      "step": 217
+    },
+    {
+      "epoch": 0.2219394247900229,
+      "grad_norm": 7.9436116218566895,
+      "learning_rate": 8.874587090368221e-05,
+      "loss": 1.9141,
+      "step": 218
+    },
+    {
+      "epoch": 0.22295749554594044,
+      "grad_norm": 9.188081741333008,
+      "learning_rate": 8.86441536463877e-05,
+      "loss": 2.5944,
+      "step": 219
+    },
+    {
+      "epoch": 0.22397556630185797,
+      "grad_norm": 9.442697525024414,
+      "learning_rate": 8.85420376347168e-05,
+      "loss": 2.616,
+      "step": 220
+    },
+    {
+      "epoch": 0.2249936370577755,
+      "grad_norm": 7.059047222137451,
+      "learning_rate": 8.843952392236594e-05,
+      "loss": 1.8199,
+      "step": 221
+    },
+    {
+      "epoch": 0.22601170781369306,
+      "grad_norm": 9.448399543762207,
+      "learning_rate": 8.833661356713528e-05,
+      "loss": 2.2707,
+      "step": 222
+    },
+    {
+      "epoch": 0.2270297785696106,
+      "grad_norm": 7.232347011566162,
+      "learning_rate": 8.823330763091775e-05,
+      "loss": 2.2834,
+      "step": 223
+    },
+    {
+      "epoch": 0.22804784932552813,
+      "grad_norm": 7.126833438873291,
+      "learning_rate": 8.812960717968818e-05,
+      "loss": 2.2613,
+      "step": 224
+    },
+    {
+      "epoch": 0.22906592008144566,
+      "grad_norm": 7.250087261199951,
+      "learning_rate": 8.802551328349222e-05,
+      "loss": 2.0233,
+      "step": 225
+    },
+    {
+      "epoch": 0.2300839908373632,
+      "grad_norm": 9.801566123962402,
+      "learning_rate": 8.792102701643531e-05,
+      "loss": 2.6283,
+      "step": 226
+    },
+    {
+      "epoch": 0.23110206159328073,
+      "grad_norm": 8.86218547821045,
+      "learning_rate": 8.781614945667169e-05,
+      "loss": 2.7821,
+      "step": 227
+    },
+    {
+      "epoch": 0.23212013234919826,
+      "grad_norm": 7.009481430053711,
+      "learning_rate": 8.771088168639312e-05,
+      "loss": 2.187,
+      "step": 228
+    },
+    {
+      "epoch": 0.2331382031051158,
+      "grad_norm": 7.643123149871826,
+      "learning_rate": 8.760522479181784e-05,
+      "loss": 2.0065,
+      "step": 229
+    },
+    {
+      "epoch": 0.23415627386103335,
+      "grad_norm": 6.573335647583008,
+      "learning_rate": 8.749917986317928e-05,
+      "loss": 1.939,
+      "step": 230
+    },
+    {
+      "epoch": 0.2351743446169509,
+      "grad_norm": 9.001991271972656,
+      "learning_rate": 8.73927479947149e-05,
+      "loss": 2.8534,
+      "step": 231
+    },
+    {
+      "epoch": 0.23619241537286842,
+      "grad_norm": 9.186355590820312,
+      "learning_rate": 8.72859302846548e-05,
+      "loss": 3.112,
+      "step": 232
+    },
+    {
+      "epoch": 0.23721048612878595,
+      "grad_norm": 9.961040496826172,
+      "learning_rate": 8.717872783521047e-05,
+      "loss": 3.2593,
+      "step": 233
+    },
+    {
+      "epoch": 0.23822855688470349,
+      "grad_norm": 8.34619426727295,
+      "learning_rate": 8.707114175256335e-05,
+      "loss": 2.2664,
+      "step": 234
+    },
+    {
+      "epoch": 0.23924662764062102,
+      "grad_norm": 7.473055839538574,
+      "learning_rate": 8.696317314685341e-05,
+      "loss": 2.8765,
+      "step": 235
+    },
+    {
+      "epoch": 0.24026469839653855,
+      "grad_norm": 6.791398048400879,
+      "learning_rate": 8.685482313216783e-05,
+      "loss": 2.098,
+      "step": 236
+    },
+    {
+      "epoch": 0.24128276915245608,
+      "grad_norm": 9.765985488891602,
+      "learning_rate": 8.674609282652934e-05,
+      "loss": 3.2374,
+      "step": 237
+    },
+    {
+      "epoch": 0.24230083990837364,
+      "grad_norm": 7.459610462188721,
+      "learning_rate": 8.663698335188477e-05,
+      "loss": 2.456,
+      "step": 238
+    },
+    {
+      "epoch": 0.24331891066429118,
+      "grad_norm": 8.42564868927002,
+      "learning_rate": 8.65274958340934e-05,
+      "loss": 2.3464,
+      "step": 239
+    },
+    {
+      "epoch": 0.2443369814202087,
+      "grad_norm": 7.114076137542725,
+      "learning_rate": 8.641763140291545e-05,
+      "loss": 2.1128,
+      "step": 240
+    },
+    {
+      "epoch": 0.24535505217612624,
+      "grad_norm": 9.573076248168945,
+      "learning_rate": 8.630739119200035e-05,
+      "loss": 2.4448,
+      "step": 241
+    },
+    {
+      "epoch": 0.24637312293204378,
+      "grad_norm": 7.850905895233154,
+      "learning_rate": 8.619677633887509e-05,
+      "loss": 2.446,
+      "step": 242
+    },
+    {
+      "epoch": 0.2473911936879613,
+      "grad_norm": 9.630354881286621,
+      "learning_rate": 8.608578798493236e-05,
+      "loss": 2.3875,
+      "step": 243
+    },
+    {
+      "epoch": 0.24840926444387884,
+      "grad_norm": 7.196229457855225,
+      "learning_rate": 8.597442727541897e-05,
+      "loss": 1.6186,
+      "step": 244
+    },
+    {
+      "epoch": 0.24942733519979637,
+      "grad_norm": 11.08008098602295,
+      "learning_rate": 8.586269535942385e-05,
+      "loss": 2.839,
+      "step": 245
+    },
+    {
+      "epoch": 0.2504454059557139,
+      "grad_norm": 8.258538246154785,
+      "learning_rate": 8.575059338986633e-05,
+      "loss": 2.2807,
+      "step": 246
+    }
+  ],
+  "logging_steps": 1,
+  "max_steps": 983,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 2,
+  "save_steps": 246,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 2.0243359280922624e+17,
+  "train_batch_size": 4,
+  "trial_name": null,
+  "trial_params": null
+}

last-checkpoint/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:77f28005a766b5d499bb48d8ea7291f29391a014397e511bd39d8f0103e6dadb
+size 6776