Upload folder using huggingface_hub

Browse files

Files changed (12) hide show

README.md +202 -0
adapter_config.json +32 -0
adapter_model.safetensors +3 -0
added_tokens.json +3 -0
optimizer.pt +3 -0
rng_state.pth +3 -0
scheduler.pt +3 -0
special_tokens_map.json +30 -0
tokenizer.model +3 -0
tokenizer_config.json +52 -0
trainer_state.json +1727 -0
training_args.bin +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: meta-llama/Llama-2-7b-hf
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.14.0

adapter_config.json ADDED Viewed

	@@ -0,0 +1,32 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "meta-llama/Llama-2-7b-hf",
+  "bias": "none",
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16.0,
+  "lora_bias": false,
+  "lora_dropout": 0.01,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 64,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "q_proj",
+    "v_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:916af11d0703123ba667e1f3fb7f7f309d584a8ed4448932d2908e2c7e88013c
+size 134235048

added_tokens.json ADDED Viewed

	@@ -0,0 +1,3 @@

+{
+  "[PAD]": 32000
+}

optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4f6b06f28f0388c0b61c7bbd24012e36baa3d39a961dee7d0dc03afab6d4b891
+size 268543610

rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8f358ea43734a8aa857e087131b9d0f2bff0982e036a1d94a0e12dd14813682e
+size 14244

scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5aeb274e69529baf3171d7a61d2d553638f67c2ed841e18f0520c827a83b052a
+size 1064

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,30 @@

+{
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "[PAD]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9e556afd44213b6bd1be2b850ebbbd98f5481437a8021afaf58ee7fb1818d347
+size 499723

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,52 @@

+{
+  "add_bos_token": true,
+  "add_eos_token": false,
+  "add_prefix_space": true,
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32000": {
+      "content": "[PAD]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<s>",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "</s>",
+  "extra_special_tokens": {},
+  "legacy": false,
+  "model_max_length": 512,
+  "pad_token": "[PAD]",
+  "padding_side": "right",
+  "sp_model_kwargs": {},
+  "spaces_between_special_tokens": false,
+  "tokenizer_class": "LlamaTokenizer",
+  "unk_token": "<unk>",
+  "use_default_system_prompt": false
+}

trainer_state.json ADDED Viewed

	@@ -0,0 +1,1727 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 2.996599690880989,
+  "eval_steps": 500,
+  "global_step": 2424,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.012364760432766615,
+      "grad_norm": 0.04875350371003151,
+      "learning_rate": 0.00029876237623762373,
+      "loss": 1.0085,
+      "step": 10
+    },
+    {
+      "epoch": 0.02472952086553323,
+      "grad_norm": 0.03758955001831055,
+      "learning_rate": 0.0002975247524752475,
+      "loss": 0.9611,
+      "step": 20
+    },
+    {
+      "epoch": 0.03709428129829984,
+      "grad_norm": 0.036796920001506805,
+      "learning_rate": 0.00029628712871287126,
+      "loss": 0.9033,
+      "step": 30
+    },
+    {
+      "epoch": 0.04945904173106646,
+      "grad_norm": 0.04110224172472954,
+      "learning_rate": 0.000295049504950495,
+      "loss": 0.9154,
+      "step": 40
+    },
+    {
+      "epoch": 0.061823802163833076,
+      "grad_norm": 0.03811247646808624,
+      "learning_rate": 0.0002938118811881188,
+      "loss": 0.9001,
+      "step": 50
+    },
+    {
+      "epoch": 0.07418856259659969,
+      "grad_norm": 0.0396280363202095,
+      "learning_rate": 0.00029257425742574254,
+      "loss": 0.9069,
+      "step": 60
+    },
+    {
+      "epoch": 0.0865533230293663,
+      "grad_norm": 0.036311160773038864,
+      "learning_rate": 0.0002913366336633663,
+      "loss": 0.8905,
+      "step": 70
+    },
+    {
+      "epoch": 0.09891808346213292,
+      "grad_norm": 0.04230085760354996,
+      "learning_rate": 0.00029009900990099006,
+      "loss": 0.928,
+      "step": 80
+    },
+    {
+      "epoch": 0.11128284389489954,
+      "grad_norm": 0.03857972100377083,
+      "learning_rate": 0.0002888613861386138,
+      "loss": 0.9122,
+      "step": 90
+    },
+    {
+      "epoch": 0.12364760432766615,
+      "grad_norm": 0.043333932757377625,
+      "learning_rate": 0.0002876237623762376,
+      "loss": 0.8921,
+      "step": 100
+    },
+    {
+      "epoch": 0.13601236476043277,
+      "grad_norm": 0.03789997100830078,
+      "learning_rate": 0.00028638613861386135,
+      "loss": 0.8988,
+      "step": 110
+    },
+    {
+      "epoch": 0.14837712519319937,
+      "grad_norm": 0.03549322113394737,
+      "learning_rate": 0.0002851485148514851,
+      "loss": 0.899,
+      "step": 120
+    },
+    {
+      "epoch": 0.160741885625966,
+      "grad_norm": 0.04001789167523384,
+      "learning_rate": 0.00028391089108910887,
+      "loss": 0.8937,
+      "step": 130
+    },
+    {
+      "epoch": 0.1731066460587326,
+      "grad_norm": 0.04051420837640762,
+      "learning_rate": 0.00028267326732673263,
+      "loss": 0.8941,
+      "step": 140
+    },
+    {
+      "epoch": 0.18547140649149924,
+      "grad_norm": 0.03879082202911377,
+      "learning_rate": 0.0002814356435643564,
+      "loss": 0.9086,
+      "step": 150
+    },
+    {
+      "epoch": 0.19783616692426584,
+      "grad_norm": 0.03938911855220795,
+      "learning_rate": 0.00028019801980198015,
+      "loss": 0.908,
+      "step": 160
+    },
+    {
+      "epoch": 0.21020092735703247,
+      "grad_norm": 0.04576217010617256,
+      "learning_rate": 0.0002789603960396039,
+      "loss": 0.9058,
+      "step": 170
+    },
+    {
+      "epoch": 0.22256568778979907,
+      "grad_norm": 0.05652037635445595,
+      "learning_rate": 0.00027772277227722773,
+      "loss": 0.8718,
+      "step": 180
+    },
+    {
+      "epoch": 0.23493044822256567,
+      "grad_norm": 0.04269680753350258,
+      "learning_rate": 0.00027648514851485144,
+      "loss": 0.9102,
+      "step": 190
+    },
+    {
+      "epoch": 0.2472952086553323,
+      "grad_norm": 0.046770963817834854,
+      "learning_rate": 0.0002752475247524752,
+      "loss": 0.8893,
+      "step": 200
+    },
+    {
+      "epoch": 0.2596599690880989,
+      "grad_norm": 0.0391731895506382,
+      "learning_rate": 0.000274009900990099,
+      "loss": 0.8858,
+      "step": 210
+    },
+    {
+      "epoch": 0.27202472952086554,
+      "grad_norm": 0.04302387312054634,
+      "learning_rate": 0.0002727722772277227,
+      "loss": 0.8967,
+      "step": 220
+    },
+    {
+      "epoch": 0.28438948995363217,
+      "grad_norm": 0.04622489586472511,
+      "learning_rate": 0.0002715346534653465,
+      "loss": 0.8919,
+      "step": 230
+    },
+    {
+      "epoch": 0.29675425038639874,
+      "grad_norm": 0.04950324073433876,
+      "learning_rate": 0.0002702970297029703,
+      "loss": 0.8898,
+      "step": 240
+    },
+    {
+      "epoch": 0.3091190108191654,
+      "grad_norm": 0.047146428376436234,
+      "learning_rate": 0.000269059405940594,
+      "loss": 0.8941,
+      "step": 250
+    },
+    {
+      "epoch": 0.321483771251932,
+      "grad_norm": 0.04186677187681198,
+      "learning_rate": 0.00026782178217821777,
+      "loss": 0.892,
+      "step": 260
+    },
+    {
+      "epoch": 0.33384853168469864,
+      "grad_norm": 0.04358995333313942,
+      "learning_rate": 0.0002665841584158416,
+      "loss": 0.8813,
+      "step": 270
+    },
+    {
+      "epoch": 0.3462132921174652,
+      "grad_norm": 0.03836526349186897,
+      "learning_rate": 0.00026534653465346534,
+      "loss": 0.8883,
+      "step": 280
+    },
+    {
+      "epoch": 0.35857805255023184,
+      "grad_norm": 0.04279692843556404,
+      "learning_rate": 0.00026410891089108905,
+      "loss": 0.8964,
+      "step": 290
+    },
+    {
+      "epoch": 0.37094281298299847,
+      "grad_norm": 0.04259683936834335,
+      "learning_rate": 0.00026287128712871287,
+      "loss": 0.8864,
+      "step": 300
+    },
+    {
+      "epoch": 0.38330757341576505,
+      "grad_norm": 0.04212690517306328,
+      "learning_rate": 0.00026163366336633663,
+      "loss": 0.8937,
+      "step": 310
+    },
+    {
+      "epoch": 0.3956723338485317,
+      "grad_norm": 0.04193605110049248,
+      "learning_rate": 0.00026039603960396033,
+      "loss": 0.8743,
+      "step": 320
+    },
+    {
+      "epoch": 0.4080370942812983,
+      "grad_norm": 0.043696388602256775,
+      "learning_rate": 0.00025915841584158415,
+      "loss": 0.8919,
+      "step": 330
+    },
+    {
+      "epoch": 0.42040185471406494,
+      "grad_norm": 0.04463732987642288,
+      "learning_rate": 0.0002579207920792079,
+      "loss": 0.8791,
+      "step": 340
+    },
+    {
+      "epoch": 0.4327666151468315,
+      "grad_norm": 0.042219433933496475,
+      "learning_rate": 0.0002566831683168316,
+      "loss": 0.8886,
+      "step": 350
+    },
+    {
+      "epoch": 0.44513137557959814,
+      "grad_norm": 0.04634915664792061,
+      "learning_rate": 0.00025544554455445543,
+      "loss": 0.8798,
+      "step": 360
+    },
+    {
+      "epoch": 0.4574961360123648,
+      "grad_norm": 0.03766421601176262,
+      "learning_rate": 0.0002542079207920792,
+      "loss": 0.8809,
+      "step": 370
+    },
+    {
+      "epoch": 0.46986089644513135,
+      "grad_norm": 0.04153716191649437,
+      "learning_rate": 0.00025297029702970296,
+      "loss": 0.8921,
+      "step": 380
+    },
+    {
+      "epoch": 0.482225656877898,
+      "grad_norm": 0.04694748297333717,
+      "learning_rate": 0.0002517326732673267,
+      "loss": 0.895,
+      "step": 390
+    },
+    {
+      "epoch": 0.4945904173106646,
+      "grad_norm": 0.05713290721178055,
+      "learning_rate": 0.0002504950495049505,
+      "loss": 0.8774,
+      "step": 400
+    },
+    {
+      "epoch": 0.5069551777434312,
+      "grad_norm": 0.0414641909301281,
+      "learning_rate": 0.00024925742574257424,
+      "loss": 0.8908,
+      "step": 410
+    },
+    {
+      "epoch": 0.5193199381761978,
+      "grad_norm": 0.04552585631608963,
+      "learning_rate": 0.000248019801980198,
+      "loss": 0.8843,
+      "step": 420
+    },
+    {
+      "epoch": 0.5316846986089645,
+      "grad_norm": 0.04167173057794571,
+      "learning_rate": 0.00024678217821782176,
+      "loss": 0.8583,
+      "step": 430
+    },
+    {
+      "epoch": 0.5440494590417311,
+      "grad_norm": 0.04508620873093605,
+      "learning_rate": 0.0002455445544554455,
+      "loss": 0.9205,
+      "step": 440
+    },
+    {
+      "epoch": 0.5564142194744977,
+      "grad_norm": 0.04546656087040901,
+      "learning_rate": 0.0002443069306930693,
+      "loss": 0.8857,
+      "step": 450
+    },
+    {
+      "epoch": 0.5687789799072643,
+      "grad_norm": 0.046972740441560745,
+      "learning_rate": 0.00024306930693069305,
+      "loss": 0.8788,
+      "step": 460
+    },
+    {
+      "epoch": 0.5811437403400309,
+      "grad_norm": 0.03991515189409256,
+      "learning_rate": 0.0002418316831683168,
+      "loss": 0.8731,
+      "step": 470
+    },
+    {
+      "epoch": 0.5935085007727975,
+      "grad_norm": 0.047520458698272705,
+      "learning_rate": 0.0002405940594059406,
+      "loss": 0.8986,
+      "step": 480
+    },
+    {
+      "epoch": 0.6058732612055642,
+      "grad_norm": 0.04166582226753235,
+      "learning_rate": 0.00023935643564356433,
+      "loss": 0.8865,
+      "step": 490
+    },
+    {
+      "epoch": 0.6182380216383307,
+      "grad_norm": 0.043145373463630676,
+      "learning_rate": 0.0002381188118811881,
+      "loss": 0.8738,
+      "step": 500
+    },
+    {
+      "epoch": 0.6306027820710973,
+      "grad_norm": 0.04514694958925247,
+      "learning_rate": 0.00023688118811881188,
+      "loss": 0.8976,
+      "step": 510
+    },
+    {
+      "epoch": 0.642967542503864,
+      "grad_norm": 0.03927430510520935,
+      "learning_rate": 0.00023564356435643561,
+      "loss": 0.8953,
+      "step": 520
+    },
+    {
+      "epoch": 0.6553323029366306,
+      "grad_norm": 0.048136577010154724,
+      "learning_rate": 0.00023440594059405938,
+      "loss": 0.8979,
+      "step": 530
+    },
+    {
+      "epoch": 0.6676970633693973,
+      "grad_norm": 0.043889183551073074,
+      "learning_rate": 0.00023316831683168316,
+      "loss": 0.9065,
+      "step": 540
+    },
+    {
+      "epoch": 0.6800618238021638,
+      "grad_norm": 0.05049331858754158,
+      "learning_rate": 0.0002319306930693069,
+      "loss": 0.8941,
+      "step": 550
+    },
+    {
+      "epoch": 0.6924265842349304,
+      "grad_norm": 0.04710015654563904,
+      "learning_rate": 0.00023069306930693066,
+      "loss": 0.8716,
+      "step": 560
+    },
+    {
+      "epoch": 0.7047913446676971,
+      "grad_norm": 0.04900379851460457,
+      "learning_rate": 0.00022945544554455445,
+      "loss": 0.9017,
+      "step": 570
+    },
+    {
+      "epoch": 0.7171561051004637,
+      "grad_norm": 0.0502135343849659,
+      "learning_rate": 0.0002282178217821782,
+      "loss": 0.8774,
+      "step": 580
+    },
+    {
+      "epoch": 0.7295208655332303,
+      "grad_norm": 0.047438524663448334,
+      "learning_rate": 0.00022698019801980194,
+      "loss": 0.8815,
+      "step": 590
+    },
+    {
+      "epoch": 0.7418856259659969,
+      "grad_norm": 0.04633660241961479,
+      "learning_rate": 0.00022574257425742573,
+      "loss": 0.8746,
+      "step": 600
+    },
+    {
+      "epoch": 0.7542503863987635,
+      "grad_norm": 0.04724352806806564,
+      "learning_rate": 0.0002245049504950495,
+      "loss": 0.9005,
+      "step": 610
+    },
+    {
+      "epoch": 0.7666151468315301,
+      "grad_norm": 0.04557831212878227,
+      "learning_rate": 0.00022326732673267323,
+      "loss": 0.8798,
+      "step": 620
+    },
+    {
+      "epoch": 0.7789799072642968,
+      "grad_norm": 0.047848157584667206,
+      "learning_rate": 0.00022202970297029702,
+      "loss": 0.893,
+      "step": 630
+    },
+    {
+      "epoch": 0.7913446676970634,
+      "grad_norm": 0.04449377954006195,
+      "learning_rate": 0.00022079207920792078,
+      "loss": 0.8851,
+      "step": 640
+    },
+    {
+      "epoch": 0.80370942812983,
+      "grad_norm": 0.04431360587477684,
+      "learning_rate": 0.0002195544554455445,
+      "loss": 0.8876,
+      "step": 650
+    },
+    {
+      "epoch": 0.8160741885625966,
+      "grad_norm": 0.04388862103223801,
+      "learning_rate": 0.0002183168316831683,
+      "loss": 0.8886,
+      "step": 660
+    },
+    {
+      "epoch": 0.8284389489953632,
+      "grad_norm": 0.04719037562608719,
+      "learning_rate": 0.00021707920792079206,
+      "loss": 0.9011,
+      "step": 670
+    },
+    {
+      "epoch": 0.8408037094281299,
+      "grad_norm": 0.04291271045804024,
+      "learning_rate": 0.00021584158415841585,
+      "loss": 0.8738,
+      "step": 680
+    },
+    {
+      "epoch": 0.8531684698608965,
+      "grad_norm": 0.04412473365664482,
+      "learning_rate": 0.00021460396039603958,
+      "loss": 0.8738,
+      "step": 690
+    },
+    {
+      "epoch": 0.865533230293663,
+      "grad_norm": 0.046331875026226044,
+      "learning_rate": 0.00021336633663366334,
+      "loss": 0.8899,
+      "step": 700
+    },
+    {
+      "epoch": 0.8778979907264297,
+      "grad_norm": 0.04418357461690903,
+      "learning_rate": 0.00021212871287128713,
+      "loss": 0.889,
+      "step": 710
+    },
+    {
+      "epoch": 0.8902627511591963,
+      "grad_norm": 0.04221678525209427,
+      "learning_rate": 0.00021089108910891087,
+      "loss": 0.8751,
+      "step": 720
+    },
+    {
+      "epoch": 0.9026275115919629,
+      "grad_norm": 0.04472072795033455,
+      "learning_rate": 0.00020965346534653463,
+      "loss": 0.8775,
+      "step": 730
+    },
+    {
+      "epoch": 0.9149922720247295,
+      "grad_norm": 0.04348697140812874,
+      "learning_rate": 0.00020841584158415842,
+      "loss": 0.8751,
+      "step": 740
+    },
+    {
+      "epoch": 0.9273570324574961,
+      "grad_norm": 0.04851846024394035,
+      "learning_rate": 0.00020717821782178215,
+      "loss": 0.8951,
+      "step": 750
+    },
+    {
+      "epoch": 0.9397217928902627,
+      "grad_norm": 0.04079887643456459,
+      "learning_rate": 0.0002059405940594059,
+      "loss": 0.9028,
+      "step": 760
+    },
+    {
+      "epoch": 0.9520865533230294,
+      "grad_norm": 0.04408387467265129,
+      "learning_rate": 0.0002047029702970297,
+      "loss": 0.8842,
+      "step": 770
+    },
+    {
+      "epoch": 0.964451313755796,
+      "grad_norm": 0.04127747192978859,
+      "learning_rate": 0.00020346534653465346,
+      "loss": 0.876,
+      "step": 780
+    },
+    {
+      "epoch": 0.9768160741885626,
+      "grad_norm": 0.05341732129454613,
+      "learning_rate": 0.0002022277227722772,
+      "loss": 0.8843,
+      "step": 790
+    },
+    {
+      "epoch": 0.9891808346213292,
+      "grad_norm": 0.04694453999400139,
+      "learning_rate": 0.00020099009900990098,
+      "loss": 0.8815,
+      "step": 800
+    },
+    {
+      "epoch": 1.0012364760432766,
+      "grad_norm": 0.05160349979996681,
+      "learning_rate": 0.00019975247524752475,
+      "loss": 0.8857,
+      "step": 810
+    },
+    {
+      "epoch": 1.0136012364760432,
+      "grad_norm": 0.0415058434009552,
+      "learning_rate": 0.00019851485148514848,
+      "loss": 0.8461,
+      "step": 820
+    },
+    {
+      "epoch": 1.02596599690881,
+      "grad_norm": 0.04516944661736488,
+      "learning_rate": 0.00019727722772277227,
+      "loss": 0.8548,
+      "step": 830
+    },
+    {
+      "epoch": 1.0383307573415765,
+      "grad_norm": 0.04707406461238861,
+      "learning_rate": 0.00019603960396039603,
+      "loss": 0.8503,
+      "step": 840
+    },
+    {
+      "epoch": 1.0506955177743431,
+      "grad_norm": 0.049354761838912964,
+      "learning_rate": 0.00019480198019801976,
+      "loss": 0.8584,
+      "step": 850
+    },
+    {
+      "epoch": 1.0630602782071097,
+      "grad_norm": 0.04959525167942047,
+      "learning_rate": 0.00019356435643564355,
+      "loss": 0.8788,
+      "step": 860
+    },
+    {
+      "epoch": 1.0754250386398763,
+      "grad_norm": 0.048685044050216675,
+      "learning_rate": 0.0001923267326732673,
+      "loss": 0.873,
+      "step": 870
+    },
+    {
+      "epoch": 1.087789799072643,
+      "grad_norm": 0.045906998217105865,
+      "learning_rate": 0.00019108910891089107,
+      "loss": 0.8775,
+      "step": 880
+    },
+    {
+      "epoch": 1.1001545595054096,
+      "grad_norm": 0.04486127197742462,
+      "learning_rate": 0.00018985148514851484,
+      "loss": 0.8647,
+      "step": 890
+    },
+    {
+      "epoch": 1.1125193199381762,
+      "grad_norm": 0.05211256071925163,
+      "learning_rate": 0.0001886138613861386,
+      "loss": 0.8594,
+      "step": 900
+    },
+    {
+      "epoch": 1.1248840803709428,
+      "grad_norm": 0.05048747360706329,
+      "learning_rate": 0.00018737623762376236,
+      "loss": 0.8564,
+      "step": 910
+    },
+    {
+      "epoch": 1.1372488408037094,
+      "grad_norm": 0.04840526729822159,
+      "learning_rate": 0.00018613861386138612,
+      "loss": 0.8782,
+      "step": 920
+    },
+    {
+      "epoch": 1.1496136012364762,
+      "grad_norm": 0.049940045922994614,
+      "learning_rate": 0.00018490099009900988,
+      "loss": 0.867,
+      "step": 930
+    },
+    {
+      "epoch": 1.1619783616692427,
+      "grad_norm": 0.053729794919490814,
+      "learning_rate": 0.00018366336633663364,
+      "loss": 0.8585,
+      "step": 940
+    },
+    {
+      "epoch": 1.1743431221020093,
+      "grad_norm": 0.05020948871970177,
+      "learning_rate": 0.0001824257425742574,
+      "loss": 0.8688,
+      "step": 950
+    },
+    {
+      "epoch": 1.1867078825347759,
+      "grad_norm": 0.0517219677567482,
+      "learning_rate": 0.00018118811881188116,
+      "loss": 0.8731,
+      "step": 960
+    },
+    {
+      "epoch": 1.1990726429675425,
+      "grad_norm": 0.04891285300254822,
+      "learning_rate": 0.00017995049504950493,
+      "loss": 0.8348,
+      "step": 970
+    },
+    {
+      "epoch": 1.211437403400309,
+      "grad_norm": 0.051312196999788284,
+      "learning_rate": 0.00017871287128712871,
+      "loss": 0.8658,
+      "step": 980
+    },
+    {
+      "epoch": 1.2238021638330758,
+      "grad_norm": 0.051922161132097244,
+      "learning_rate": 0.00017747524752475245,
+      "loss": 0.8542,
+      "step": 990
+    },
+    {
+      "epoch": 1.2361669242658424,
+      "grad_norm": 0.0521603561937809,
+      "learning_rate": 0.0001762376237623762,
+      "loss": 0.8628,
+      "step": 1000
+    },
+    {
+      "epoch": 1.248531684698609,
+      "grad_norm": 0.05443425104022026,
+      "learning_rate": 0.000175,
+      "loss": 0.875,
+      "step": 1010
+    },
+    {
+      "epoch": 1.2608964451313756,
+      "grad_norm": 0.05506913363933563,
+      "learning_rate": 0.00017376237623762373,
+      "loss": 0.8704,
+      "step": 1020
+    },
+    {
+      "epoch": 1.2732612055641421,
+      "grad_norm": 0.05535837262868881,
+      "learning_rate": 0.00017252475247524752,
+      "loss": 0.8629,
+      "step": 1030
+    },
+    {
+      "epoch": 1.2856259659969087,
+      "grad_norm": 0.050953879952430725,
+      "learning_rate": 0.00017128712871287128,
+      "loss": 0.8386,
+      "step": 1040
+    },
+    {
+      "epoch": 1.2979907264296755,
+      "grad_norm": 0.047925543040037155,
+      "learning_rate": 0.00017004950495049502,
+      "loss": 0.8664,
+      "step": 1050
+    },
+    {
+      "epoch": 1.310355486862442,
+      "grad_norm": 0.054691240191459656,
+      "learning_rate": 0.0001688118811881188,
+      "loss": 0.8634,
+      "step": 1060
+    },
+    {
+      "epoch": 1.3227202472952087,
+      "grad_norm": 0.05087495222687721,
+      "learning_rate": 0.00016757425742574257,
+      "loss": 0.8482,
+      "step": 1070
+    },
+    {
+      "epoch": 1.3350850077279752,
+      "grad_norm": 0.051902078092098236,
+      "learning_rate": 0.00016633663366336633,
+      "loss": 0.8478,
+      "step": 1080
+    },
+    {
+      "epoch": 1.3474497681607418,
+      "grad_norm": 0.05033488944172859,
+      "learning_rate": 0.0001650990099009901,
+      "loss": 0.8572,
+      "step": 1090
+    },
+    {
+      "epoch": 1.3598145285935086,
+      "grad_norm": 0.05153260752558708,
+      "learning_rate": 0.00016386138613861385,
+      "loss": 0.8465,
+      "step": 1100
+    },
+    {
+      "epoch": 1.3721792890262752,
+      "grad_norm": 0.052806247025728226,
+      "learning_rate": 0.0001626237623762376,
+      "loss": 0.8707,
+      "step": 1110
+    },
+    {
+      "epoch": 1.3845440494590417,
+      "grad_norm": 0.05425600707530975,
+      "learning_rate": 0.00016138613861386137,
+      "loss": 0.858,
+      "step": 1120
+    },
+    {
+      "epoch": 1.3969088098918083,
+      "grad_norm": 0.05116913095116615,
+      "learning_rate": 0.00016014851485148513,
+      "loss": 0.867,
+      "step": 1130
+    },
+    {
+      "epoch": 1.409273570324575,
+      "grad_norm": 0.052799541503190994,
+      "learning_rate": 0.0001589108910891089,
+      "loss": 0.849,
+      "step": 1140
+    },
+    {
+      "epoch": 1.4216383307573417,
+      "grad_norm": 0.06275513023138046,
+      "learning_rate": 0.00015767326732673266,
+      "loss": 0.8577,
+      "step": 1150
+    },
+    {
+      "epoch": 1.4340030911901083,
+      "grad_norm": 0.051965054124593735,
+      "learning_rate": 0.00015643564356435642,
+      "loss": 0.853,
+      "step": 1160
+    },
+    {
+      "epoch": 1.4463678516228748,
+      "grad_norm": 0.05356529727578163,
+      "learning_rate": 0.00015519801980198018,
+      "loss": 0.8789,
+      "step": 1170
+    },
+    {
+      "epoch": 1.4587326120556414,
+      "grad_norm": 0.05566537007689476,
+      "learning_rate": 0.00015396039603960397,
+      "loss": 0.8716,
+      "step": 1180
+    },
+    {
+      "epoch": 1.471097372488408,
+      "grad_norm": 0.05320986732840538,
+      "learning_rate": 0.0001527227722772277,
+      "loss": 0.8736,
+      "step": 1190
+    },
+    {
+      "epoch": 1.4834621329211746,
+      "grad_norm": 0.049232013523578644,
+      "learning_rate": 0.00015148514851485146,
+      "loss": 0.849,
+      "step": 1200
+    },
+    {
+      "epoch": 1.4958268933539411,
+      "grad_norm": 0.058629848062992096,
+      "learning_rate": 0.00015024752475247525,
+      "loss": 0.8732,
+      "step": 1210
+    },
+    {
+      "epoch": 1.508191653786708,
+      "grad_norm": 0.055390194058418274,
+      "learning_rate": 0.000149009900990099,
+      "loss": 0.8678,
+      "step": 1220
+    },
+    {
+      "epoch": 1.5205564142194745,
+      "grad_norm": 0.05527270585298538,
+      "learning_rate": 0.00014777227722772275,
+      "loss": 0.8643,
+      "step": 1230
+    },
+    {
+      "epoch": 1.532921174652241,
+      "grad_norm": 0.04652067646384239,
+      "learning_rate": 0.00014653465346534653,
+      "loss": 0.8429,
+      "step": 1240
+    },
+    {
+      "epoch": 1.545285935085008,
+      "grad_norm": 0.05379781499505043,
+      "learning_rate": 0.0001452970297029703,
+      "loss": 0.8492,
+      "step": 1250
+    },
+    {
+      "epoch": 1.5576506955177742,
+      "grad_norm": 0.05249844118952751,
+      "learning_rate": 0.00014405940594059403,
+      "loss": 0.8447,
+      "step": 1260
+    },
+    {
+      "epoch": 1.570015455950541,
+      "grad_norm": 0.049639806151390076,
+      "learning_rate": 0.00014282178217821782,
+      "loss": 0.8667,
+      "step": 1270
+    },
+    {
+      "epoch": 1.5823802163833076,
+      "grad_norm": 0.052679967135190964,
+      "learning_rate": 0.00014158415841584158,
+      "loss": 0.8739,
+      "step": 1280
+    },
+    {
+      "epoch": 1.5947449768160742,
+      "grad_norm": 0.05990573391318321,
+      "learning_rate": 0.00014034653465346534,
+      "loss": 0.8824,
+      "step": 1290
+    },
+    {
+      "epoch": 1.6071097372488408,
+      "grad_norm": 0.052240803837776184,
+      "learning_rate": 0.0001391089108910891,
+      "loss": 0.868,
+      "step": 1300
+    },
+    {
+      "epoch": 1.6194744976816073,
+      "grad_norm": 0.05380776897072792,
+      "learning_rate": 0.00013787128712871286,
+      "loss": 0.8442,
+      "step": 1310
+    },
+    {
+      "epoch": 1.6318392581143741,
+      "grad_norm": 0.05081896856427193,
+      "learning_rate": 0.00013663366336633662,
+      "loss": 0.8515,
+      "step": 1320
+    },
+    {
+      "epoch": 1.6442040185471405,
+      "grad_norm": 0.04869316518306732,
+      "learning_rate": 0.00013539603960396039,
+      "loss": 0.8339,
+      "step": 1330
+    },
+    {
+      "epoch": 1.6565687789799073,
+      "grad_norm": 0.056119490414857864,
+      "learning_rate": 0.00013415841584158415,
+      "loss": 0.8483,
+      "step": 1340
+    },
+    {
+      "epoch": 1.6689335394126739,
+      "grad_norm": 0.05742491036653519,
+      "learning_rate": 0.0001329207920792079,
+      "loss": 0.8494,
+      "step": 1350
+    },
+    {
+      "epoch": 1.6812982998454404,
+      "grad_norm": 0.055017951875925064,
+      "learning_rate": 0.00013168316831683167,
+      "loss": 0.8563,
+      "step": 1360
+    },
+    {
+      "epoch": 1.6936630602782072,
+      "grad_norm": 0.04963842034339905,
+      "learning_rate": 0.00013044554455445543,
+      "loss": 0.8405,
+      "step": 1370
+    },
+    {
+      "epoch": 1.7060278207109736,
+      "grad_norm": 0.05574873462319374,
+      "learning_rate": 0.0001292079207920792,
+      "loss": 0.8557,
+      "step": 1380
+    },
+    {
+      "epoch": 1.7183925811437404,
+      "grad_norm": 0.05482814088463783,
+      "learning_rate": 0.00012797029702970295,
+      "loss": 0.8559,
+      "step": 1390
+    },
+    {
+      "epoch": 1.730757341576507,
+      "grad_norm": 0.06040499359369278,
+      "learning_rate": 0.00012673267326732672,
+      "loss": 0.8638,
+      "step": 1400
+    },
+    {
+      "epoch": 1.7431221020092735,
+      "grad_norm": 0.05430367961525917,
+      "learning_rate": 0.00012549504950495048,
+      "loss": 0.8473,
+      "step": 1410
+    },
+    {
+      "epoch": 1.7554868624420403,
+      "grad_norm": 0.048315104097127914,
+      "learning_rate": 0.00012425742574257426,
+      "loss": 0.845,
+      "step": 1420
+    },
+    {
+      "epoch": 1.7678516228748067,
+      "grad_norm": 0.05943458899855614,
+      "learning_rate": 0.000123019801980198,
+      "loss": 0.853,
+      "step": 1430
+    },
+    {
+      "epoch": 1.7802163833075735,
+      "grad_norm": 0.05744357407093048,
+      "learning_rate": 0.00012178217821782177,
+      "loss": 0.8562,
+      "step": 1440
+    },
+    {
+      "epoch": 1.79258114374034,
+      "grad_norm": 0.06155743822455406,
+      "learning_rate": 0.00012054455445544554,
+      "loss": 0.8404,
+      "step": 1450
+    },
+    {
+      "epoch": 1.8049459041731066,
+      "grad_norm": 0.04887942224740982,
+      "learning_rate": 0.0001193069306930693,
+      "loss": 0.8477,
+      "step": 1460
+    },
+    {
+      "epoch": 1.8173106646058734,
+      "grad_norm": 0.05377992242574692,
+      "learning_rate": 0.00011806930693069306,
+      "loss": 0.8765,
+      "step": 1470
+    },
+    {
+      "epoch": 1.8296754250386398,
+      "grad_norm": 0.0468844473361969,
+      "learning_rate": 0.00011683168316831682,
+      "loss": 0.8151,
+      "step": 1480
+    },
+    {
+      "epoch": 1.8420401854714066,
+      "grad_norm": 0.05763052776455879,
+      "learning_rate": 0.0001155940594059406,
+      "loss": 0.8539,
+      "step": 1490
+    },
+    {
+      "epoch": 1.8544049459041732,
+      "grad_norm": 0.054946307092905045,
+      "learning_rate": 0.00011435643564356434,
+      "loss": 0.8564,
+      "step": 1500
+    },
+    {
+      "epoch": 1.8667697063369397,
+      "grad_norm": 0.060760248452425,
+      "learning_rate": 0.0001131188118811881,
+      "loss": 0.8498,
+      "step": 1510
+    },
+    {
+      "epoch": 1.8791344667697063,
+      "grad_norm": 0.058039598166942596,
+      "learning_rate": 0.00011188118811881188,
+      "loss": 0.8647,
+      "step": 1520
+    },
+    {
+      "epoch": 1.8914992272024729,
+      "grad_norm": 0.05479070916771889,
+      "learning_rate": 0.00011064356435643564,
+      "loss": 0.8625,
+      "step": 1530
+    },
+    {
+      "epoch": 1.9038639876352397,
+      "grad_norm": 0.0583939254283905,
+      "learning_rate": 0.00010940594059405939,
+      "loss": 0.8695,
+      "step": 1540
+    },
+    {
+      "epoch": 1.916228748068006,
+      "grad_norm": 0.058852337300777435,
+      "learning_rate": 0.00010816831683168316,
+      "loss": 0.8443,
+      "step": 1550
+    },
+    {
+      "epoch": 1.9285935085007728,
+      "grad_norm": 0.05506705492734909,
+      "learning_rate": 0.00010693069306930692,
+      "loss": 0.8544,
+      "step": 1560
+    },
+    {
+      "epoch": 1.9409582689335394,
+      "grad_norm": 0.05682089179754257,
+      "learning_rate": 0.00010569306930693068,
+      "loss": 0.8718,
+      "step": 1570
+    },
+    {
+      "epoch": 1.953323029366306,
+      "grad_norm": 0.05604562535881996,
+      "learning_rate": 0.00010445544554455445,
+      "loss": 0.857,
+      "step": 1580
+    },
+    {
+      "epoch": 1.9656877897990728,
+      "grad_norm": 0.058413226157426834,
+      "learning_rate": 0.0001032178217821782,
+      "loss": 0.8558,
+      "step": 1590
+    },
+    {
+      "epoch": 1.9780525502318391,
+      "grad_norm": 0.054590627551078796,
+      "learning_rate": 0.00010198019801980197,
+      "loss": 0.8443,
+      "step": 1600
+    },
+    {
+      "epoch": 1.990417310664606,
+      "grad_norm": 0.05447821691632271,
+      "learning_rate": 0.00010074257425742573,
+      "loss": 0.8672,
+      "step": 1610
+    },
+    {
+      "epoch": 2.002472952086553,
+      "grad_norm": 0.05398769676685333,
+      "learning_rate": 9.95049504950495e-05,
+      "loss": 0.8582,
+      "step": 1620
+    },
+    {
+      "epoch": 2.01483771251932,
+      "grad_norm": 0.057375263422727585,
+      "learning_rate": 9.826732673267325e-05,
+      "loss": 0.8418,
+      "step": 1630
+    },
+    {
+      "epoch": 2.0272024729520863,
+      "grad_norm": 0.054974183440208435,
+      "learning_rate": 9.702970297029701e-05,
+      "loss": 0.8224,
+      "step": 1640
+    },
+    {
+      "epoch": 2.039567233384853,
+      "grad_norm": 0.06044444069266319,
+      "learning_rate": 9.579207920792079e-05,
+      "loss": 0.8373,
+      "step": 1650
+    },
+    {
+      "epoch": 2.05193199381762,
+      "grad_norm": 0.06379813700914383,
+      "learning_rate": 9.455445544554454e-05,
+      "loss": 0.8311,
+      "step": 1660
+    },
+    {
+      "epoch": 2.0642967542503863,
+      "grad_norm": 0.05604099482297897,
+      "learning_rate": 9.331683168316831e-05,
+      "loss": 0.8586,
+      "step": 1670
+    },
+    {
+      "epoch": 2.076661514683153,
+      "grad_norm": 0.05408864468336105,
+      "learning_rate": 9.207920792079207e-05,
+      "loss": 0.8385,
+      "step": 1680
+    },
+    {
+      "epoch": 2.0890262751159194,
+      "grad_norm": 0.06171610206365585,
+      "learning_rate": 9.084158415841582e-05,
+      "loss": 0.836,
+      "step": 1690
+    },
+    {
+      "epoch": 2.1013910355486862,
+      "grad_norm": 0.05357811599969864,
+      "learning_rate": 8.96039603960396e-05,
+      "loss": 0.8365,
+      "step": 1700
+    },
+    {
+      "epoch": 2.113755795981453,
+      "grad_norm": 0.059701114892959595,
+      "learning_rate": 8.836633663366336e-05,
+      "loss": 0.8168,
+      "step": 1710
+    },
+    {
+      "epoch": 2.1261205564142194,
+      "grad_norm": 0.05693197622895241,
+      "learning_rate": 8.712871287128713e-05,
+      "loss": 0.8588,
+      "step": 1720
+    },
+    {
+      "epoch": 2.138485316846986,
+      "grad_norm": 0.06465724855661392,
+      "learning_rate": 8.589108910891088e-05,
+      "loss": 0.8342,
+      "step": 1730
+    },
+    {
+      "epoch": 2.1508500772797525,
+      "grad_norm": 0.06339121609926224,
+      "learning_rate": 8.465346534653464e-05,
+      "loss": 0.8338,
+      "step": 1740
+    },
+    {
+      "epoch": 2.1632148377125193,
+      "grad_norm": 0.05768771097064018,
+      "learning_rate": 8.341584158415841e-05,
+      "loss": 0.8321,
+      "step": 1750
+    },
+    {
+      "epoch": 2.175579598145286,
+      "grad_norm": 0.05351224169135094,
+      "learning_rate": 8.217821782178216e-05,
+      "loss": 0.8426,
+      "step": 1760
+    },
+    {
+      "epoch": 2.1879443585780525,
+      "grad_norm": 0.06381036341190338,
+      "learning_rate": 8.094059405940594e-05,
+      "loss": 0.8531,
+      "step": 1770
+    },
+    {
+      "epoch": 2.2003091190108193,
+      "grad_norm": 0.057617682963609695,
+      "learning_rate": 7.97029702970297e-05,
+      "loss": 0.8263,
+      "step": 1780
+    },
+    {
+      "epoch": 2.2126738794435856,
+      "grad_norm": 0.06280315667390823,
+      "learning_rate": 7.846534653465345e-05,
+      "loss": 0.8073,
+      "step": 1790
+    },
+    {
+      "epoch": 2.2250386398763524,
+      "grad_norm": 0.06251993030309677,
+      "learning_rate": 7.722772277227722e-05,
+      "loss": 0.8285,
+      "step": 1800
+    },
+    {
+      "epoch": 2.237403400309119,
+      "grad_norm": 0.05487222224473953,
+      "learning_rate": 7.599009900990098e-05,
+      "loss": 0.8389,
+      "step": 1810
+    },
+    {
+      "epoch": 2.2497681607418856,
+      "grad_norm": 0.06212658807635307,
+      "learning_rate": 7.475247524752474e-05,
+      "loss": 0.819,
+      "step": 1820
+    },
+    {
+      "epoch": 2.2621329211746524,
+      "grad_norm": 0.06791824847459793,
+      "learning_rate": 7.35148514851485e-05,
+      "loss": 0.823,
+      "step": 1830
+    },
+    {
+      "epoch": 2.2744976816074187,
+      "grad_norm": 0.06564588844776154,
+      "learning_rate": 7.227722772277227e-05,
+      "loss": 0.8399,
+      "step": 1840
+    },
+    {
+      "epoch": 2.2868624420401855,
+      "grad_norm": 0.07918984442949295,
+      "learning_rate": 7.103960396039604e-05,
+      "loss": 0.8441,
+      "step": 1850
+    },
+    {
+      "epoch": 2.2992272024729523,
+      "grad_norm": 0.06684021651744843,
+      "learning_rate": 6.98019801980198e-05,
+      "loss": 0.8213,
+      "step": 1860
+    },
+    {
+      "epoch": 2.3115919629057187,
+      "grad_norm": 0.05864300578832626,
+      "learning_rate": 6.856435643564355e-05,
+      "loss": 0.8238,
+      "step": 1870
+    },
+    {
+      "epoch": 2.3239567233384855,
+      "grad_norm": 0.05827944353222847,
+      "learning_rate": 6.732673267326732e-05,
+      "loss": 0.8438,
+      "step": 1880
+    },
+    {
+      "epoch": 2.336321483771252,
+      "grad_norm": 0.05539786070585251,
+      "learning_rate": 6.608910891089109e-05,
+      "loss": 0.8173,
+      "step": 1890
+    },
+    {
+      "epoch": 2.3486862442040186,
+      "grad_norm": 0.06571885198354721,
+      "learning_rate": 6.485148514851485e-05,
+      "loss": 0.8262,
+      "step": 1900
+    },
+    {
+      "epoch": 2.361051004636785,
+      "grad_norm": 0.06220625340938568,
+      "learning_rate": 6.361386138613861e-05,
+      "loss": 0.8576,
+      "step": 1910
+    },
+    {
+      "epoch": 2.3734157650695518,
+      "grad_norm": 0.0579352080821991,
+      "learning_rate": 6.237623762376237e-05,
+      "loss": 0.8227,
+      "step": 1920
+    },
+    {
+      "epoch": 2.3857805255023186,
+      "grad_norm": 0.06193961948156357,
+      "learning_rate": 6.113861386138613e-05,
+      "loss": 0.8414,
+      "step": 1930
+    },
+    {
+      "epoch": 2.398145285935085,
+      "grad_norm": 0.061364226043224335,
+      "learning_rate": 5.99009900990099e-05,
+      "loss": 0.8387,
+      "step": 1940
+    },
+    {
+      "epoch": 2.4105100463678517,
+      "grad_norm": 0.05785266309976578,
+      "learning_rate": 5.866336633663366e-05,
+      "loss": 0.8284,
+      "step": 1950
+    },
+    {
+      "epoch": 2.422874806800618,
+      "grad_norm": 0.057832520455121994,
+      "learning_rate": 5.742574257425742e-05,
+      "loss": 0.8197,
+      "step": 1960
+    },
+    {
+      "epoch": 2.435239567233385,
+      "grad_norm": 0.06421726942062378,
+      "learning_rate": 5.618811881188118e-05,
+      "loss": 0.8402,
+      "step": 1970
+    },
+    {
+      "epoch": 2.4476043276661517,
+      "grad_norm": 0.06815137714147568,
+      "learning_rate": 5.4950495049504944e-05,
+      "loss": 0.8389,
+      "step": 1980
+    },
+    {
+      "epoch": 2.459969088098918,
+      "grad_norm": 0.06730205565690994,
+      "learning_rate": 5.371287128712871e-05,
+      "loss": 0.8604,
+      "step": 1990
+    },
+    {
+      "epoch": 2.472333848531685,
+      "grad_norm": 0.05876993387937546,
+      "learning_rate": 5.247524752475247e-05,
+      "loss": 0.8254,
+      "step": 2000
+    },
+    {
+      "epoch": 2.484698608964451,
+      "grad_norm": 0.06757384538650513,
+      "learning_rate": 5.1237623762376234e-05,
+      "loss": 0.8292,
+      "step": 2010
+    },
+    {
+      "epoch": 2.497063369397218,
+      "grad_norm": 0.06531625986099243,
+      "learning_rate": 4.9999999999999996e-05,
+      "loss": 0.8321,
+      "step": 2020
+    },
+    {
+      "epoch": 2.5094281298299848,
+      "grad_norm": 0.060086678713560104,
+      "learning_rate": 4.876237623762376e-05,
+      "loss": 0.8459,
+      "step": 2030
+    },
+    {
+      "epoch": 2.521792890262751,
+      "grad_norm": 0.06336929649114609,
+      "learning_rate": 4.752475247524752e-05,
+      "loss": 0.82,
+      "step": 2040
+    },
+    {
+      "epoch": 2.534157650695518,
+      "grad_norm": 0.06393607705831528,
+      "learning_rate": 4.6287128712871286e-05,
+      "loss": 0.8313,
+      "step": 2050
+    },
+    {
+      "epoch": 2.5465224111282843,
+      "grad_norm": 0.06480514258146286,
+      "learning_rate": 4.504950495049505e-05,
+      "loss": 0.8442,
+      "step": 2060
+    },
+    {
+      "epoch": 2.558887171561051,
+      "grad_norm": 0.07233238965272903,
+      "learning_rate": 4.38118811881188e-05,
+      "loss": 0.8256,
+      "step": 2070
+    },
+    {
+      "epoch": 2.5712519319938174,
+      "grad_norm": 0.06636520475149155,
+      "learning_rate": 4.257425742574257e-05,
+      "loss": 0.8284,
+      "step": 2080
+    },
+    {
+      "epoch": 2.583616692426584,
+      "grad_norm": 0.07016933709383011,
+      "learning_rate": 4.133663366336633e-05,
+      "loss": 0.8378,
+      "step": 2090
+    },
+    {
+      "epoch": 2.595981452859351,
+      "grad_norm": 0.06668656319379807,
+      "learning_rate": 4.00990099009901e-05,
+      "loss": 0.8461,
+      "step": 2100
+    },
+    {
+      "epoch": 2.6083462132921174,
+      "grad_norm": 0.07296927273273468,
+      "learning_rate": 3.886138613861386e-05,
+      "loss": 0.8432,
+      "step": 2110
+    },
+    {
+      "epoch": 2.620710973724884,
+      "grad_norm": 0.06670273840427399,
+      "learning_rate": 3.7623762376237615e-05,
+      "loss": 0.819,
+      "step": 2120
+    },
+    {
+      "epoch": 2.633075734157651,
+      "grad_norm": 0.060203880071640015,
+      "learning_rate": 3.638613861386138e-05,
+      "loss": 0.8063,
+      "step": 2130
+    },
+    {
+      "epoch": 2.6454404945904173,
+      "grad_norm": 0.06635984778404236,
+      "learning_rate": 3.5148514851485144e-05,
+      "loss": 0.8364,
+      "step": 2140
+    },
+    {
+      "epoch": 2.6578052550231837,
+      "grad_norm": 0.060412149876356125,
+      "learning_rate": 3.3910891089108906e-05,
+      "loss": 0.827,
+      "step": 2150
+    },
+    {
+      "epoch": 2.6701700154559505,
+      "grad_norm": 0.05948295816779137,
+      "learning_rate": 3.267326732673267e-05,
+      "loss": 0.802,
+      "step": 2160
+    },
+    {
+      "epoch": 2.6825347758887172,
+      "grad_norm": 0.06251130253076553,
+      "learning_rate": 3.1435643564356435e-05,
+      "loss": 0.8302,
+      "step": 2170
+    },
+    {
+      "epoch": 2.6948995363214836,
+      "grad_norm": 0.06650058180093765,
+      "learning_rate": 3.0198019801980193e-05,
+      "loss": 0.8386,
+      "step": 2180
+    },
+    {
+      "epoch": 2.7072642967542504,
+      "grad_norm": 0.07029715925455093,
+      "learning_rate": 2.8960396039603958e-05,
+      "loss": 0.8421,
+      "step": 2190
+    },
+    {
+      "epoch": 2.719629057187017,
+      "grad_norm": 0.06135771796107292,
+      "learning_rate": 2.772277227722772e-05,
+      "loss": 0.829,
+      "step": 2200
+    },
+    {
+      "epoch": 2.7319938176197835,
+      "grad_norm": 0.06303984671831131,
+      "learning_rate": 2.6485148514851484e-05,
+      "loss": 0.8408,
+      "step": 2210
+    },
+    {
+      "epoch": 2.7443585780525503,
+      "grad_norm": 0.06396885961294174,
+      "learning_rate": 2.5247524752475248e-05,
+      "loss": 0.817,
+      "step": 2220
+    },
+    {
+      "epoch": 2.7567233384853167,
+      "grad_norm": 0.05814013257622719,
+      "learning_rate": 2.4009900990099006e-05,
+      "loss": 0.8384,
+      "step": 2230
+    },
+    {
+      "epoch": 2.7690880989180835,
+      "grad_norm": 0.07185972481966019,
+      "learning_rate": 2.277227722772277e-05,
+      "loss": 0.8143,
+      "step": 2240
+    },
+    {
+      "epoch": 2.78145285935085,
+      "grad_norm": 0.06624460965394974,
+      "learning_rate": 2.1534653465346532e-05,
+      "loss": 0.8296,
+      "step": 2250
+    },
+    {
+      "epoch": 2.7938176197836166,
+      "grad_norm": 0.06510159373283386,
+      "learning_rate": 2.0297029702970297e-05,
+      "loss": 0.8302,
+      "step": 2260
+    },
+    {
+      "epoch": 2.8061823802163834,
+      "grad_norm": 0.06376007944345474,
+      "learning_rate": 1.9059405940594058e-05,
+      "loss": 0.8173,
+      "step": 2270
+    },
+    {
+      "epoch": 2.81854714064915,
+      "grad_norm": 0.0644875094294548,
+      "learning_rate": 1.782178217821782e-05,
+      "loss": 0.8358,
+      "step": 2280
+    },
+    {
+      "epoch": 2.8309119010819166,
+      "grad_norm": 0.05571649968624115,
+      "learning_rate": 1.6584158415841584e-05,
+      "loss": 0.8139,
+      "step": 2290
+    },
+    {
+      "epoch": 2.8432766615146834,
+      "grad_norm": 0.06556589901447296,
+      "learning_rate": 1.5346534653465345e-05,
+      "loss": 0.8241,
+      "step": 2300
+    },
+    {
+      "epoch": 2.8556414219474497,
+      "grad_norm": 0.06124770641326904,
+      "learning_rate": 1.4108910891089108e-05,
+      "loss": 0.8184,
+      "step": 2310
+    },
+    {
+      "epoch": 2.8680061823802165,
+      "grad_norm": 0.06707081943750381,
+      "learning_rate": 1.287128712871287e-05,
+      "loss": 0.8194,
+      "step": 2320
+    },
+    {
+      "epoch": 2.880370942812983,
+      "grad_norm": 0.06600210070610046,
+      "learning_rate": 1.1633663366336632e-05,
+      "loss": 0.8614,
+      "step": 2330
+    },
+    {
+      "epoch": 2.8927357032457497,
+      "grad_norm": 0.06310021132230759,
+      "learning_rate": 1.0396039603960395e-05,
+      "loss": 0.8485,
+      "step": 2340
+    },
+    {
+      "epoch": 2.905100463678516,
+      "grad_norm": 0.06110014021396637,
+      "learning_rate": 9.158415841584158e-06,
+      "loss": 0.8171,
+      "step": 2350
+    },
+    {
+      "epoch": 2.917465224111283,
+      "grad_norm": 0.07767624408006668,
+      "learning_rate": 7.92079207920792e-06,
+      "loss": 0.8403,
+      "step": 2360
+    },
+    {
+      "epoch": 2.9298299845440496,
+      "grad_norm": 0.07363846898078918,
+      "learning_rate": 6.683168316831683e-06,
+      "loss": 0.8453,
+      "step": 2370
+    },
+    {
+      "epoch": 2.942194744976816,
+      "grad_norm": 0.06242545694112778,
+      "learning_rate": 5.445544554455446e-06,
+      "loss": 0.8448,
+      "step": 2380
+    },
+    {
+      "epoch": 2.954559505409583,
+      "grad_norm": 0.06873054057359695,
+      "learning_rate": 4.207920792079208e-06,
+      "loss": 0.8412,
+      "step": 2390
+    },
+    {
+      "epoch": 2.966924265842349,
+      "grad_norm": 0.06264466792345047,
+      "learning_rate": 2.97029702970297e-06,
+      "loss": 0.8426,
+      "step": 2400
+    },
+    {
+      "epoch": 2.979289026275116,
+      "grad_norm": 0.06905832886695862,
+      "learning_rate": 1.7326732673267324e-06,
+      "loss": 0.8469,
+      "step": 2410
+    },
+    {
+      "epoch": 2.9916537867078823,
+      "grad_norm": 0.061685774475336075,
+      "learning_rate": 4.95049504950495e-07,
+      "loss": 0.8339,
+      "step": 2420
+    }
+  ],
+  "logging_steps": 10,
+  "max_steps": 2424,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 3,
+  "save_steps": 500,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": true
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 2.723183421298311e+18,
+  "train_batch_size": 8,
+  "trial_name": null,
+  "trial_params": null
+}

training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:916d7fb58500db212016cbc23cab8f6e9cc2c103f69a6804826e7bb07c43a797
+size 5496