Training in progress, step 200, checkpoint

Browse files

Files changed (14) hide show

last-checkpoint/README.md +202 -0
last-checkpoint/adapter_config.json +30 -0
last-checkpoint/adapter_model.safetensors +3 -0
last-checkpoint/added_tokens.json +4 -0
last-checkpoint/merges.txt +0 -0
last-checkpoint/optimizer.pt +3 -0
last-checkpoint/rng_state.pth +3 -0
last-checkpoint/scheduler.pt +3 -0
last-checkpoint/special_tokens_map.json +30 -0
last-checkpoint/tokenizer.json +0 -0
last-checkpoint/tokenizer_config.json +205 -0
last-checkpoint/trainer_state.json +1458 -0
last-checkpoint/training_args.bin +3 -0
last-checkpoint/vocab.json +0 -0

last-checkpoint/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: katuni4ka/tiny-random-dbrx
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.13.2

last-checkpoint/adapter_config.json ADDED Viewed

	@@ -0,0 +1,30 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "katuni4ka/tiny-random-dbrx",
+  "bias": "none",
+  "fan_in_fan_out": null,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 64,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 32,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "out_proj",
+    "Wqkv",
+    "layer"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

last-checkpoint/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:64f30275384600729bcfd077d0123f3112d35ed4e0f19f813113cac6af12f8e6
+size 18064

last-checkpoint/added_tokens.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+  "<|im_end|>": 100279,
+  "<|im_start|>": 100278
+}

last-checkpoint/merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

last-checkpoint/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:26d6bca4f3112ed4a2b3b5228a1d4cdb95afbe128e025d8098a40a7ab7983350
+size 40454

last-checkpoint/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:545b57e91dfef21c7c8335da28e1f286dbe989bb14026b513c833248ff318a63
+size 14244

last-checkpoint/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:acd26c01191934d645c94037f82c9dfd9bcc00424d2e29e5bbcd1e0b7a4da603
+size 1064

last-checkpoint/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,30 @@

+{
+  "bos_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<|pad|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

last-checkpoint/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

last-checkpoint/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,205 @@

+{
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "100256": {
+      "content": "<||_unused_0_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100257": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100258": {
+      "content": "<|fim_prefix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100259": {
+      "content": "<|fim_middle|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100260": {
+      "content": "<|fim_suffix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100261": {
+      "content": "<||_unused_1_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100262": {
+      "content": "<||_unused_2_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100263": {
+      "content": "<||_unused_3_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100264": {
+      "content": "<||_unused_4_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100265": {
+      "content": "<||_unused_5_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100266": {
+      "content": "<||_unused_6_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100267": {
+      "content": "<||_unused_7_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100268": {
+      "content": "<||_unused_8_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100269": {
+      "content": "<||_unused_9_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100270": {
+      "content": "<||_unused_10_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100271": {
+      "content": "<||_unused_11_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100272": {
+      "content": "<||_unused_12_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100273": {
+      "content": "<||_unused_13_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100274": {
+      "content": "<||_unused_14_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100275": {
+      "content": "<||_unused_15_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100276": {
+      "content": "<|endofprompt|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100277": {
+      "content": "<|pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100278": {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100279": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<|endoftext|>",
+  "chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}{% endif %}",
+  "clean_up_tokenization_spaces": true,
+  "eos_token": "<|endoftext|>",
+  "model_max_length": 32768,
+  "pad_token": "<|pad|>",
+  "tokenizer_class": "GPT2Tokenizer",
+  "unk_token": "<|endoftext|>"
+}

last-checkpoint/trainer_state.json ADDED Viewed

	@@ -0,0 +1,1458 @@

+{
+  "best_metric": 11.5,
+  "best_model_checkpoint": "miner_id_24/checkpoint-200",
+  "epoch": 0.05731069560856795,
+  "eval_steps": 200,
+  "global_step": 200,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.00028655347804283973,
+      "grad_norm": 7.381157047348097e-05,
+      "learning_rate": 6.666666666666667e-06,
+      "loss": 46.0,
+      "step": 1
+    },
+    {
+      "epoch": 0.00028655347804283973,
+      "eval_loss": 11.5,
+      "eval_runtime": 1.0821,
+      "eval_samples_per_second": 130.301,
+      "eval_steps_per_second": 65.612,
+      "step": 1
+    },
+    {
+      "epoch": 0.0005731069560856795,
+      "grad_norm": 6.596604362130165e-05,
+      "learning_rate": 1.3333333333333333e-05,
+      "loss": 46.0,
+      "step": 2
+    },
+    {
+      "epoch": 0.0008596604341285192,
+      "grad_norm": 0.00010368386574555188,
+      "learning_rate": 2e-05,
+      "loss": 46.0,
+      "step": 3
+    },
+    {
+      "epoch": 0.001146213912171359,
+      "grad_norm": 0.0001162872213171795,
+      "learning_rate": 2.6666666666666667e-05,
+      "loss": 46.0,
+      "step": 4
+    },
+    {
+      "epoch": 0.0014327673902141988,
+      "grad_norm": 7.124816329451278e-05,
+      "learning_rate": 3.3333333333333335e-05,
+      "loss": 46.0,
+      "step": 5
+    },
+    {
+      "epoch": 0.0017193208682570384,
+      "grad_norm": 6.46381449769251e-05,
+      "learning_rate": 4e-05,
+      "loss": 46.0,
+      "step": 6
+    },
+    {
+      "epoch": 0.002005874346299878,
+      "grad_norm": 6.736363138770685e-05,
+      "learning_rate": 4.666666666666667e-05,
+      "loss": 46.0,
+      "step": 7
+    },
+    {
+      "epoch": 0.002292427824342718,
+      "grad_norm": 8.684724161867052e-05,
+      "learning_rate": 5.333333333333333e-05,
+      "loss": 46.0,
+      "step": 8
+    },
+    {
+      "epoch": 0.0025789813023855577,
+      "grad_norm": 0.00011209803051315248,
+      "learning_rate": 6e-05,
+      "loss": 46.0,
+      "step": 9
+    },
+    {
+      "epoch": 0.0028655347804283976,
+      "grad_norm": 7.704443851253018e-05,
+      "learning_rate": 6.666666666666667e-05,
+      "loss": 46.0,
+      "step": 10
+    },
+    {
+      "epoch": 0.0031520882584712374,
+      "grad_norm": 0.00012871134094893932,
+      "learning_rate": 7.333333333333333e-05,
+      "loss": 46.0,
+      "step": 11
+    },
+    {
+      "epoch": 0.003438641736514077,
+      "grad_norm": 9.129566024057567e-05,
+      "learning_rate": 8e-05,
+      "loss": 46.0,
+      "step": 12
+    },
+    {
+      "epoch": 0.0037251952145569167,
+      "grad_norm": 0.00016532166046090424,
+      "learning_rate": 8.666666666666667e-05,
+      "loss": 46.0,
+      "step": 13
+    },
+    {
+      "epoch": 0.004011748692599756,
+      "grad_norm": 4.0770170016912743e-05,
+      "learning_rate": 9.333333333333334e-05,
+      "loss": 46.0,
+      "step": 14
+    },
+    {
+      "epoch": 0.004298302170642596,
+      "grad_norm": 6.018530257279053e-05,
+      "learning_rate": 0.0001,
+      "loss": 46.0,
+      "step": 15
+    },
+    {
+      "epoch": 0.004584855648685436,
+      "grad_norm": 0.00010712119546951726,
+      "learning_rate": 0.00010666666666666667,
+      "loss": 46.0,
+      "step": 16
+    },
+    {
+      "epoch": 0.004871409126728276,
+      "grad_norm": 9.569038957124576e-05,
+      "learning_rate": 0.00011333333333333334,
+      "loss": 46.0,
+      "step": 17
+    },
+    {
+      "epoch": 0.0051579626047711154,
+      "grad_norm": 4.893005097983405e-05,
+      "learning_rate": 0.00012,
+      "loss": 46.0,
+      "step": 18
+    },
+    {
+      "epoch": 0.005444516082813955,
+      "grad_norm": 6.882779416628182e-05,
+      "learning_rate": 0.00012666666666666666,
+      "loss": 46.0,
+      "step": 19
+    },
+    {
+      "epoch": 0.005731069560856795,
+      "grad_norm": 0.00010898563778027892,
+      "learning_rate": 0.00013333333333333334,
+      "loss": 46.0,
+      "step": 20
+    },
+    {
+      "epoch": 0.006017623038899635,
+      "grad_norm": 5.298621908877976e-05,
+      "learning_rate": 0.00014,
+      "loss": 46.0,
+      "step": 21
+    },
+    {
+      "epoch": 0.006304176516942475,
+      "grad_norm": 8.256702130893245e-05,
+      "learning_rate": 0.00014666666666666666,
+      "loss": 46.0,
+      "step": 22
+    },
+    {
+      "epoch": 0.006590729994985314,
+      "grad_norm": 7.129500590963289e-05,
+      "learning_rate": 0.00015333333333333334,
+      "loss": 46.0,
+      "step": 23
+    },
+    {
+      "epoch": 0.006877283473028154,
+      "grad_norm": 6.098434096202254e-05,
+      "learning_rate": 0.00016,
+      "loss": 46.0,
+      "step": 24
+    },
+    {
+      "epoch": 0.0071638369510709935,
+      "grad_norm": 5.094590233056806e-05,
+      "learning_rate": 0.0001666666666666667,
+      "loss": 46.0,
+      "step": 25
+    },
+    {
+      "epoch": 0.007450390429113833,
+      "grad_norm": 0.00010607678268570453,
+      "learning_rate": 0.00017333333333333334,
+      "loss": 46.0,
+      "step": 26
+    },
+    {
+      "epoch": 0.007736943907156673,
+      "grad_norm": 0.00013629230670630932,
+      "learning_rate": 0.00018,
+      "loss": 46.0,
+      "step": 27
+    },
+    {
+      "epoch": 0.008023497385199512,
+      "grad_norm": 5.341317955753766e-05,
+      "learning_rate": 0.0001866666666666667,
+      "loss": 46.0,
+      "step": 28
+    },
+    {
+      "epoch": 0.008310050863242352,
+      "grad_norm": 7.902220386313275e-05,
+      "learning_rate": 0.00019333333333333333,
+      "loss": 46.0,
+      "step": 29
+    },
+    {
+      "epoch": 0.008596604341285192,
+      "grad_norm": 0.00012108004011679441,
+      "learning_rate": 0.0002,
+      "loss": 46.0,
+      "step": 30
+    },
+    {
+      "epoch": 0.008883157819328032,
+      "grad_norm": 0.0001192674899357371,
+      "learning_rate": 0.00019999999546978955,
+      "loss": 46.0,
+      "step": 31
+    },
+    {
+      "epoch": 0.009169711297370872,
+      "grad_norm": 6.242323433980346e-05,
+      "learning_rate": 0.00019999998187915854,
+      "loss": 46.0,
+      "step": 32
+    },
+    {
+      "epoch": 0.009456264775413711,
+      "grad_norm": 0.00010374305566074327,
+      "learning_rate": 0.00019999995922810823,
+      "loss": 46.0,
+      "step": 33
+    },
+    {
+      "epoch": 0.009742818253456551,
+      "grad_norm": 7.662192365387455e-05,
+      "learning_rate": 0.0001999999275166407,
+      "loss": 46.0,
+      "step": 34
+    },
+    {
+      "epoch": 0.010029371731499391,
+      "grad_norm": 5.456238068290986e-05,
+      "learning_rate": 0.00019999988674475878,
+      "loss": 46.0,
+      "step": 35
+    },
+    {
+      "epoch": 0.010315925209542231,
+      "grad_norm": 5.168572533875704e-05,
+      "learning_rate": 0.0001999998369124662,
+      "loss": 46.0,
+      "step": 36
+    },
+    {
+      "epoch": 0.01060247868758507,
+      "grad_norm": 8.381979569094256e-05,
+      "learning_rate": 0.00019999977801976742,
+      "loss": 46.0,
+      "step": 37
+    },
+    {
+      "epoch": 0.01088903216562791,
+      "grad_norm": 0.00010368989751441404,
+      "learning_rate": 0.0001999997100666678,
+      "loss": 46.0,
+      "step": 38
+    },
+    {
+      "epoch": 0.01117558564367075,
+      "grad_norm": 9.2031252279412e-05,
+      "learning_rate": 0.00019999963305317354,
+      "loss": 46.0,
+      "step": 39
+    },
+    {
+      "epoch": 0.01146213912171359,
+      "grad_norm": 5.379249341785908e-05,
+      "learning_rate": 0.0001999995469792916,
+      "loss": 46.0,
+      "step": 40
+    },
+    {
+      "epoch": 0.01174869259975643,
+      "grad_norm": 9.862228762358427e-05,
+      "learning_rate": 0.00019999945184502973,
+      "loss": 46.0,
+      "step": 41
+    },
+    {
+      "epoch": 0.01203524607779927,
+      "grad_norm": 9.657383634475991e-05,
+      "learning_rate": 0.0001999993476503966,
+      "loss": 46.0,
+      "step": 42
+    },
+    {
+      "epoch": 0.01232179955584211,
+      "grad_norm": 8.879863162292168e-05,
+      "learning_rate": 0.00019999923439540163,
+      "loss": 46.0,
+      "step": 43
+    },
+    {
+      "epoch": 0.01260835303388495,
+      "grad_norm": 6.925352499820292e-05,
+      "learning_rate": 0.00019999911208005508,
+      "loss": 46.0,
+      "step": 44
+    },
+    {
+      "epoch": 0.012894906511927788,
+      "grad_norm": 8.172217349056154e-05,
+      "learning_rate": 0.00019999898070436806,
+      "loss": 46.0,
+      "step": 45
+    },
+    {
+      "epoch": 0.013181459989970628,
+      "grad_norm": 8.213800174416974e-05,
+      "learning_rate": 0.00019999884026835246,
+      "loss": 46.0,
+      "step": 46
+    },
+    {
+      "epoch": 0.013468013468013467,
+      "grad_norm": 6.044626570655964e-05,
+      "learning_rate": 0.00019999869077202097,
+      "loss": 46.0,
+      "step": 47
+    },
+    {
+      "epoch": 0.013754566946056307,
+      "grad_norm": 6.844953168183565e-05,
+      "learning_rate": 0.00019999853221538714,
+      "loss": 46.0,
+      "step": 48
+    },
+    {
+      "epoch": 0.014041120424099147,
+      "grad_norm": 0.00013352354289963841,
+      "learning_rate": 0.00019999836459846538,
+      "loss": 46.0,
+      "step": 49
+    },
+    {
+      "epoch": 0.014327673902141987,
+      "grad_norm": 7.450298289768398e-05,
+      "learning_rate": 0.00019999818792127086,
+      "loss": 46.0,
+      "step": 50
+    },
+    {
+      "epoch": 0.014614227380184827,
+      "grad_norm": 7.48366946936585e-05,
+      "learning_rate": 0.00019999800218381956,
+      "loss": 46.0,
+      "step": 51
+    },
+    {
+      "epoch": 0.014900780858227667,
+      "grad_norm": 5.8166733651887625e-05,
+      "learning_rate": 0.00019999780738612835,
+      "loss": 46.0,
+      "step": 52
+    },
+    {
+      "epoch": 0.015187334336270506,
+      "grad_norm": 8.82293243193999e-05,
+      "learning_rate": 0.00019999760352821486,
+      "loss": 46.0,
+      "step": 53
+    },
+    {
+      "epoch": 0.015473887814313346,
+      "grad_norm": 8.331720164278522e-05,
+      "learning_rate": 0.00019999739061009753,
+      "loss": 46.0,
+      "step": 54
+    },
+    {
+      "epoch": 0.015760441292356184,
+      "grad_norm": 0.00011742674541892484,
+      "learning_rate": 0.00019999716863179572,
+      "loss": 46.0,
+      "step": 55
+    },
+    {
+      "epoch": 0.016046994770399024,
+      "grad_norm": 7.888959953561425e-05,
+      "learning_rate": 0.0001999969375933295,
+      "loss": 46.0,
+      "step": 56
+    },
+    {
+      "epoch": 0.016333548248441864,
+      "grad_norm": 9.424455492990091e-05,
+      "learning_rate": 0.00019999669749471978,
+      "loss": 46.0,
+      "step": 57
+    },
+    {
+      "epoch": 0.016620101726484704,
+      "grad_norm": 4.706982508650981e-05,
+      "learning_rate": 0.00019999644833598837,
+      "loss": 46.0,
+      "step": 58
+    },
+    {
+      "epoch": 0.016906655204527544,
+      "grad_norm": 9.459713328396901e-05,
+      "learning_rate": 0.00019999619011715778,
+      "loss": 46.0,
+      "step": 59
+    },
+    {
+      "epoch": 0.017193208682570384,
+      "grad_norm": 7.433071732521057e-05,
+      "learning_rate": 0.0001999959228382515,
+      "loss": 46.0,
+      "step": 60
+    },
+    {
+      "epoch": 0.017479762160613223,
+      "grad_norm": 0.00013755627151113003,
+      "learning_rate": 0.00019999564649929362,
+      "loss": 46.0,
+      "step": 61
+    },
+    {
+      "epoch": 0.017766315638656063,
+      "grad_norm": 7.515963079640642e-05,
+      "learning_rate": 0.00019999536110030925,
+      "loss": 46.0,
+      "step": 62
+    },
+    {
+      "epoch": 0.018052869116698903,
+      "grad_norm": 9.297157521359622e-05,
+      "learning_rate": 0.00019999506664132425,
+      "loss": 46.0,
+      "step": 63
+    },
+    {
+      "epoch": 0.018339422594741743,
+      "grad_norm": 6.67767963022925e-05,
+      "learning_rate": 0.00019999476312236532,
+      "loss": 46.0,
+      "step": 64
+    },
+    {
+      "epoch": 0.018625976072784583,
+      "grad_norm": 0.00010362596367485821,
+      "learning_rate": 0.00019999445054345993,
+      "loss": 46.0,
+      "step": 65
+    },
+    {
+      "epoch": 0.018912529550827423,
+      "grad_norm": 0.00010248312901239842,
+      "learning_rate": 0.0001999941289046364,
+      "loss": 46.0,
+      "step": 66
+    },
+    {
+      "epoch": 0.019199083028870263,
+      "grad_norm": 0.00011670257663354278,
+      "learning_rate": 0.00019999379820592386,
+      "loss": 46.0,
+      "step": 67
+    },
+    {
+      "epoch": 0.019485636506913102,
+      "grad_norm": 9.483767644269392e-05,
+      "learning_rate": 0.00019999345844735227,
+      "loss": 46.0,
+      "step": 68
+    },
+    {
+      "epoch": 0.019772189984955942,
+      "grad_norm": 0.00012299351510591805,
+      "learning_rate": 0.00019999310962895246,
+      "loss": 46.0,
+      "step": 69
+    },
+    {
+      "epoch": 0.020058743462998782,
+      "grad_norm": 0.00013893700088374317,
+      "learning_rate": 0.000199992751750756,
+      "loss": 46.0,
+      "step": 70
+    },
+    {
+      "epoch": 0.020345296941041622,
+      "grad_norm": 0.00015965943748597056,
+      "learning_rate": 0.0001999923848127953,
+      "loss": 46.0,
+      "step": 71
+    },
+    {
+      "epoch": 0.020631850419084462,
+      "grad_norm": 0.00010530438157729805,
+      "learning_rate": 0.00019999200881510367,
+      "loss": 46.0,
+      "step": 72
+    },
+    {
+      "epoch": 0.0209184038971273,
+      "grad_norm": 0.00018572794215288013,
+      "learning_rate": 0.0001999916237577151,
+      "loss": 46.0,
+      "step": 73
+    },
+    {
+      "epoch": 0.02120495737517014,
+      "grad_norm": 7.796911086188629e-05,
+      "learning_rate": 0.00019999122964066453,
+      "loss": 46.0,
+      "step": 74
+    },
+    {
+      "epoch": 0.02149151085321298,
+      "grad_norm": 0.00011319840996293351,
+      "learning_rate": 0.00019999082646398765,
+      "loss": 46.0,
+      "step": 75
+    },
+    {
+      "epoch": 0.02177806433125582,
+      "grad_norm": 8.448911103187129e-05,
+      "learning_rate": 0.00019999041422772096,
+      "loss": 46.0,
+      "step": 76
+    },
+    {
+      "epoch": 0.02206461780929866,
+      "grad_norm": 0.00010140217636944726,
+      "learning_rate": 0.00019998999293190185,
+      "loss": 46.0,
+      "step": 77
+    },
+    {
+      "epoch": 0.0223511712873415,
+      "grad_norm": 0.00012014622916467488,
+      "learning_rate": 0.0001999895625765685,
+      "loss": 46.0,
+      "step": 78
+    },
+    {
+      "epoch": 0.02263772476538434,
+      "grad_norm": 0.00016250344924628735,
+      "learning_rate": 0.0001999891231617599,
+      "loss": 46.0,
+      "step": 79
+    },
+    {
+      "epoch": 0.02292427824342718,
+      "grad_norm": 0.0001607730082469061,
+      "learning_rate": 0.0001999886746875158,
+      "loss": 46.0,
+      "step": 80
+    },
+    {
+      "epoch": 0.02321083172147002,
+      "grad_norm": 9.321732068201527e-05,
+      "learning_rate": 0.0001999882171538769,
+      "loss": 46.0,
+      "step": 81
+    },
+    {
+      "epoch": 0.02349738519951286,
+      "grad_norm": 8.782186341704801e-05,
+      "learning_rate": 0.00019998775056088465,
+      "loss": 46.0,
+      "step": 82
+    },
+    {
+      "epoch": 0.0237839386775557,
+      "grad_norm": 0.0001984712143894285,
+      "learning_rate": 0.0001999872749085813,
+      "loss": 46.0,
+      "step": 83
+    },
+    {
+      "epoch": 0.02407049215559854,
+      "grad_norm": 9.399009286426008e-05,
+      "learning_rate": 0.00019998679019700994,
+      "loss": 46.0,
+      "step": 84
+    },
+    {
+      "epoch": 0.02435704563364138,
+      "grad_norm": 0.00010639801621437073,
+      "learning_rate": 0.00019998629642621453,
+      "loss": 46.0,
+      "step": 85
+    },
+    {
+      "epoch": 0.02464359911168422,
+      "grad_norm": 0.0001523221581010148,
+      "learning_rate": 0.00019998579359623974,
+      "loss": 46.0,
+      "step": 86
+    },
+    {
+      "epoch": 0.02493015258972706,
+      "grad_norm": 0.0002136339171556756,
+      "learning_rate": 0.00019998528170713122,
+      "loss": 46.0,
+      "step": 87
+    },
+    {
+      "epoch": 0.0252167060677699,
+      "grad_norm": 0.0002231518883490935,
+      "learning_rate": 0.00019998476075893526,
+      "loss": 46.0,
+      "step": 88
+    },
+    {
+      "epoch": 0.025503259545812736,
+      "grad_norm": 0.0003373888903297484,
+      "learning_rate": 0.0001999842307516991,
+      "loss": 46.0,
+      "step": 89
+    },
+    {
+      "epoch": 0.025789813023855575,
+      "grad_norm": 0.0001387349038850516,
+      "learning_rate": 0.0001999836916854708,
+      "loss": 46.0,
+      "step": 90
+    },
+    {
+      "epoch": 0.026076366501898415,
+      "grad_norm": 0.0001619431423023343,
+      "learning_rate": 0.00019998314356029913,
+      "loss": 46.0,
+      "step": 91
+    },
+    {
+      "epoch": 0.026362919979941255,
+      "grad_norm": 0.00016960031643975526,
+      "learning_rate": 0.00019998258637623378,
+      "loss": 46.0,
+      "step": 92
+    },
+    {
+      "epoch": 0.026649473457984095,
+      "grad_norm": 0.00013296969700604677,
+      "learning_rate": 0.00019998202013332526,
+      "loss": 46.0,
+      "step": 93
+    },
+    {
+      "epoch": 0.026936026936026935,
+      "grad_norm": 0.00015429664927069098,
+      "learning_rate": 0.0001999814448316248,
+      "loss": 46.0,
+      "step": 94
+    },
+    {
+      "epoch": 0.027222580414069775,
+      "grad_norm": 0.00019981182413175702,
+      "learning_rate": 0.00019998086047118464,
+      "loss": 46.0,
+      "step": 95
+    },
+    {
+      "epoch": 0.027509133892112615,
+      "grad_norm": 0.00020218861754983664,
+      "learning_rate": 0.0001999802670520576,
+      "loss": 46.0,
+      "step": 96
+    },
+    {
+      "epoch": 0.027795687370155454,
+      "grad_norm": 0.0001801289909053594,
+      "learning_rate": 0.00019997966457429756,
+      "loss": 46.0,
+      "step": 97
+    },
+    {
+      "epoch": 0.028082240848198294,
+      "grad_norm": 0.0003259029472246766,
+      "learning_rate": 0.000199979053037959,
+      "loss": 46.0,
+      "step": 98
+    },
+    {
+      "epoch": 0.028368794326241134,
+      "grad_norm": 0.00010903332440648228,
+      "learning_rate": 0.0001999784324430974,
+      "loss": 46.0,
+      "step": 99
+    },
+    {
+      "epoch": 0.028655347804283974,
+      "grad_norm": 0.00010584478877717629,
+      "learning_rate": 0.00019997780278976901,
+      "loss": 46.0,
+      "step": 100
+    },
+    {
+      "epoch": 0.028941901282326814,
+      "grad_norm": 0.00024283739912789315,
+      "learning_rate": 0.0001999771640780308,
+      "loss": 46.0,
+      "step": 101
+    },
+    {
+      "epoch": 0.029228454760369654,
+      "grad_norm": 0.00020966072042938322,
+      "learning_rate": 0.0001999765163079407,
+      "loss": 46.0,
+      "step": 102
+    },
+    {
+      "epoch": 0.029515008238412493,
+      "grad_norm": 0.00016458171012345701,
+      "learning_rate": 0.00019997585947955737,
+      "loss": 46.0,
+      "step": 103
+    },
+    {
+      "epoch": 0.029801561716455333,
+      "grad_norm": 0.0002142781886504963,
+      "learning_rate": 0.0001999751935929403,
+      "loss": 46.0,
+      "step": 104
+    },
+    {
+      "epoch": 0.030088115194498173,
+      "grad_norm": 0.00018552150868345052,
+      "learning_rate": 0.0001999745186481499,
+      "loss": 46.0,
+      "step": 105
+    },
+    {
+      "epoch": 0.030374668672541013,
+      "grad_norm": 0.00021552396356128156,
+      "learning_rate": 0.00019997383464524728,
+      "loss": 46.0,
+      "step": 106
+    },
+    {
+      "epoch": 0.030661222150583853,
+      "grad_norm": 0.00038846713141538203,
+      "learning_rate": 0.00019997314158429438,
+      "loss": 46.0,
+      "step": 107
+    },
+    {
+      "epoch": 0.030947775628626693,
+      "grad_norm": 0.00019503093790262938,
+      "learning_rate": 0.00019997243946535406,
+      "loss": 46.0,
+      "step": 108
+    },
+    {
+      "epoch": 0.031234329106669532,
+      "grad_norm": 0.0003047584614250809,
+      "learning_rate": 0.00019997172828848986,
+      "loss": 46.0,
+      "step": 109
+    },
+    {
+      "epoch": 0.03152088258471237,
+      "grad_norm": 0.00025872603873722255,
+      "learning_rate": 0.00019997100805376627,
+      "loss": 46.0,
+      "step": 110
+    },
+    {
+      "epoch": 0.03180743606275521,
+      "grad_norm": 0.0003769325849134475,
+      "learning_rate": 0.00019997027876124854,
+      "loss": 46.0,
+      "step": 111
+    },
+    {
+      "epoch": 0.03209398954079805,
+      "grad_norm": 0.0004575467901304364,
+      "learning_rate": 0.00019996954041100274,
+      "loss": 46.0,
+      "step": 112
+    },
+    {
+      "epoch": 0.03238054301884089,
+      "grad_norm": 0.0003370651975274086,
+      "learning_rate": 0.00019996879300309575,
+      "loss": 46.0,
+      "step": 113
+    },
+    {
+      "epoch": 0.03266709649688373,
+      "grad_norm": 0.0002681570767890662,
+      "learning_rate": 0.00019996803653759532,
+      "loss": 46.0,
+      "step": 114
+    },
+    {
+      "epoch": 0.03295364997492657,
+      "grad_norm": 0.0002245080249849707,
+      "learning_rate": 0.00019996727101456995,
+      "loss": 46.0,
+      "step": 115
+    },
+    {
+      "epoch": 0.03324020345296941,
+      "grad_norm": 0.0002516446984373033,
+      "learning_rate": 0.00019996649643408906,
+      "loss": 46.0,
+      "step": 116
+    },
+    {
+      "epoch": 0.03352675693101225,
+      "grad_norm": 0.000306058645946905,
+      "learning_rate": 0.00019996571279622276,
+      "loss": 46.0,
+      "step": 117
+    },
+    {
+      "epoch": 0.03381331040905509,
+      "grad_norm": 0.0003231150913052261,
+      "learning_rate": 0.0001999649201010421,
+      "loss": 46.0,
+      "step": 118
+    },
+    {
+      "epoch": 0.03409986388709793,
+      "grad_norm": 0.00030212162528187037,
+      "learning_rate": 0.00019996411834861887,
+      "loss": 46.0,
+      "step": 119
+    },
+    {
+      "epoch": 0.03438641736514077,
+      "grad_norm": 0.0002937183016911149,
+      "learning_rate": 0.00019996330753902574,
+      "loss": 46.0,
+      "step": 120
+    },
+    {
+      "epoch": 0.03467297084318361,
+      "grad_norm": 0.0003606914251577109,
+      "learning_rate": 0.00019996248767233617,
+      "loss": 46.0,
+      "step": 121
+    },
+    {
+      "epoch": 0.03495952432122645,
+      "grad_norm": 0.0003355994704179466,
+      "learning_rate": 0.00019996165874862443,
+      "loss": 46.0,
+      "step": 122
+    },
+    {
+      "epoch": 0.03524607779926929,
+      "grad_norm": 0.00034902329207398,
+      "learning_rate": 0.0001999608207679656,
+      "loss": 46.0,
+      "step": 123
+    },
+    {
+      "epoch": 0.03553263127731213,
+      "grad_norm": 0.0002104766172124073,
+      "learning_rate": 0.00019995997373043568,
+      "loss": 46.0,
+      "step": 124
+    },
+    {
+      "epoch": 0.03581918475535497,
+      "grad_norm": 0.00023777480237185955,
+      "learning_rate": 0.00019995911763611132,
+      "loss": 46.0,
+      "step": 125
+    },
+    {
+      "epoch": 0.036105738233397806,
+      "grad_norm": 0.00029851359431631863,
+      "learning_rate": 0.00019995825248507015,
+      "loss": 46.0,
+      "step": 126
+    },
+    {
+      "epoch": 0.03639229171144065,
+      "grad_norm": 0.0004646476882044226,
+      "learning_rate": 0.00019995737827739057,
+      "loss": 46.0,
+      "step": 127
+    },
+    {
+      "epoch": 0.036678845189483486,
+      "grad_norm": 0.00029485157574526966,
+      "learning_rate": 0.0001999564950131517,
+      "loss": 46.0,
+      "step": 128
+    },
+    {
+      "epoch": 0.03696539866752633,
+      "grad_norm": 0.0007383066695183516,
+      "learning_rate": 0.00019995560269243367,
+      "loss": 46.0,
+      "step": 129
+    },
+    {
+      "epoch": 0.037251952145569166,
+      "grad_norm": 0.0003343581047374755,
+      "learning_rate": 0.00019995470131531725,
+      "loss": 46.0,
+      "step": 130
+    },
+    {
+      "epoch": 0.03753850562361201,
+      "grad_norm": 0.0004639736143872142,
+      "learning_rate": 0.00019995379088188418,
+      "loss": 46.0,
+      "step": 131
+    },
+    {
+      "epoch": 0.037825059101654845,
+      "grad_norm": 0.0007386133074760437,
+      "learning_rate": 0.0001999528713922169,
+      "loss": 46.0,
+      "step": 132
+    },
+    {
+      "epoch": 0.03811161257969769,
+      "grad_norm": 0.0003995689039584249,
+      "learning_rate": 0.0001999519428463987,
+      "loss": 46.0,
+      "step": 133
+    },
+    {
+      "epoch": 0.038398166057740525,
+      "grad_norm": 0.00027600504108704627,
+      "learning_rate": 0.00019995100524451372,
+      "loss": 46.0,
+      "step": 134
+    },
+    {
+      "epoch": 0.03868471953578337,
+      "grad_norm": 0.00037365983007475734,
+      "learning_rate": 0.00019995005858664696,
+      "loss": 46.0,
+      "step": 135
+    },
+    {
+      "epoch": 0.038971273013826205,
+      "grad_norm": 0.0006544017815031111,
+      "learning_rate": 0.00019994910287288417,
+      "loss": 46.0,
+      "step": 136
+    },
+    {
+      "epoch": 0.03925782649186905,
+      "grad_norm": 0.0004640940751414746,
+      "learning_rate": 0.0001999481381033119,
+      "loss": 46.0,
+      "step": 137
+    },
+    {
+      "epoch": 0.039544379969911884,
+      "grad_norm": 0.00038468436105176806,
+      "learning_rate": 0.00019994716427801766,
+      "loss": 46.0,
+      "step": 138
+    },
+    {
+      "epoch": 0.03983093344795473,
+      "grad_norm": 0.0009553630952723324,
+      "learning_rate": 0.0001999461813970896,
+      "loss": 46.0,
+      "step": 139
+    },
+    {
+      "epoch": 0.040117486925997564,
+      "grad_norm": 0.0002630364033393562,
+      "learning_rate": 0.00019994518946061675,
+      "loss": 46.0,
+      "step": 140
+    },
+    {
+      "epoch": 0.04040404040404041,
+      "grad_norm": 0.0002757444162853062,
+      "learning_rate": 0.00019994418846868906,
+      "loss": 46.0,
+      "step": 141
+    },
+    {
+      "epoch": 0.040690593882083244,
+      "grad_norm": 0.00042787828715518117,
+      "learning_rate": 0.00019994317842139716,
+      "loss": 46.0,
+      "step": 142
+    },
+    {
+      "epoch": 0.04097714736012608,
+      "grad_norm": 0.000318807055009529,
+      "learning_rate": 0.0001999421593188326,
+      "loss": 46.0,
+      "step": 143
+    },
+    {
+      "epoch": 0.041263700838168924,
+      "grad_norm": 0.0007039292831905186,
+      "learning_rate": 0.00019994113116108772,
+      "loss": 46.0,
+      "step": 144
+    },
+    {
+      "epoch": 0.04155025431621176,
+      "grad_norm": 0.0005391126614995301,
+      "learning_rate": 0.00019994009394825568,
+      "loss": 46.0,
+      "step": 145
+    },
+    {
+      "epoch": 0.0418368077942546,
+      "grad_norm": 0.0006373688811436296,
+      "learning_rate": 0.0001999390476804304,
+      "loss": 46.0,
+      "step": 146
+    },
+    {
+      "epoch": 0.04212336127229744,
+      "grad_norm": 0.00032521801767870784,
+      "learning_rate": 0.00019993799235770674,
+      "loss": 46.0,
+      "step": 147
+    },
+    {
+      "epoch": 0.04240991475034028,
+      "grad_norm": 0.0004875020822510123,
+      "learning_rate": 0.00019993692798018028,
+      "loss": 46.0,
+      "step": 148
+    },
+    {
+      "epoch": 0.04269646822838312,
+      "grad_norm": 0.0006042442400939763,
+      "learning_rate": 0.00019993585454794748,
+      "loss": 46.0,
+      "step": 149
+    },
+    {
+      "epoch": 0.04298302170642596,
+      "grad_norm": 0.00043931492837145925,
+      "learning_rate": 0.00019993477206110559,
+      "loss": 46.0,
+      "step": 150
+    },
+    {
+      "epoch": 0.0432695751844688,
+      "grad_norm": 0.00035808575921691954,
+      "learning_rate": 0.00019993368051975268,
+      "loss": 46.0,
+      "step": 151
+    },
+    {
+      "epoch": 0.04355612866251164,
+      "grad_norm": 0.0004806216456927359,
+      "learning_rate": 0.00019993257992398765,
+      "loss": 46.0,
+      "step": 152
+    },
+    {
+      "epoch": 0.04384268214055448,
+      "grad_norm": 0.00039719213964417577,
+      "learning_rate": 0.0001999314702739102,
+      "loss": 46.0,
+      "step": 153
+    },
+    {
+      "epoch": 0.04412923561859732,
+      "grad_norm": 0.0004144109843764454,
+      "learning_rate": 0.00019993035156962093,
+      "loss": 46.0,
+      "step": 154
+    },
+    {
+      "epoch": 0.04441578909664016,
+      "grad_norm": 0.0006228784914128482,
+      "learning_rate": 0.00019992922381122113,
+      "loss": 46.0,
+      "step": 155
+    },
+    {
+      "epoch": 0.044702342574683,
+      "grad_norm": 0.0006269604782573879,
+      "learning_rate": 0.00019992808699881303,
+      "loss": 46.0,
+      "step": 156
+    },
+    {
+      "epoch": 0.04498889605272584,
+      "grad_norm": 0.00046757658128626645,
+      "learning_rate": 0.0001999269411324996,
+      "loss": 46.0,
+      "step": 157
+    },
+    {
+      "epoch": 0.04527544953076868,
+      "grad_norm": 0.00044586771400645375,
+      "learning_rate": 0.00019992578621238466,
+      "loss": 46.0,
+      "step": 158
+    },
+    {
+      "epoch": 0.04556200300881152,
+      "grad_norm": 0.0005216663703322411,
+      "learning_rate": 0.0001999246222385729,
+      "loss": 46.0,
+      "step": 159
+    },
+    {
+      "epoch": 0.04584855648685436,
+      "grad_norm": 0.00044177312520332634,
+      "learning_rate": 0.00019992344921116972,
+      "loss": 46.0,
+      "step": 160
+    },
+    {
+      "epoch": 0.0461351099648972,
+      "grad_norm": 0.0010134356562048197,
+      "learning_rate": 0.00019992226713028138,
+      "loss": 46.0,
+      "step": 161
+    },
+    {
+      "epoch": 0.04642166344294004,
+      "grad_norm": 0.0005053270142525434,
+      "learning_rate": 0.00019992107599601508,
+      "loss": 46.0,
+      "step": 162
+    },
+    {
+      "epoch": 0.04670821692098288,
+      "grad_norm": 0.0008667529909871519,
+      "learning_rate": 0.00019991987580847867,
+      "loss": 46.0,
+      "step": 163
+    },
+    {
+      "epoch": 0.04699477039902572,
+      "grad_norm": 0.0005064812139607966,
+      "learning_rate": 0.00019991866656778092,
+      "loss": 46.0,
+      "step": 164
+    },
+    {
+      "epoch": 0.04728132387706856,
+      "grad_norm": 0.00033950377837754786,
+      "learning_rate": 0.00019991744827403137,
+      "loss": 46.0,
+      "step": 165
+    },
+    {
+      "epoch": 0.0475678773551114,
+      "grad_norm": 0.0006410735077224672,
+      "learning_rate": 0.0001999162209273404,
+      "loss": 46.0,
+      "step": 166
+    },
+    {
+      "epoch": 0.047854430833154236,
+      "grad_norm": 0.0011909378226846457,
+      "learning_rate": 0.0001999149845278192,
+      "loss": 46.0,
+      "step": 167
+    },
+    {
+      "epoch": 0.04814098431119708,
+      "grad_norm": 0.0009707885328680277,
+      "learning_rate": 0.00019991373907557987,
+      "loss": 46.0,
+      "step": 168
+    },
+    {
+      "epoch": 0.048427537789239916,
+      "grad_norm": 0.0009485428454354405,
+      "learning_rate": 0.0001999124845707352,
+      "loss": 46.0,
+      "step": 169
+    },
+    {
+      "epoch": 0.04871409126728276,
+      "grad_norm": 0.0008028754382394254,
+      "learning_rate": 0.00019991122101339884,
+      "loss": 46.0,
+      "step": 170
+    },
+    {
+      "epoch": 0.049000644745325596,
+      "grad_norm": 0.0008805354009382427,
+      "learning_rate": 0.00019990994840368527,
+      "loss": 46.0,
+      "step": 171
+    },
+    {
+      "epoch": 0.04928719822336844,
+      "grad_norm": 0.0005341559299267828,
+      "learning_rate": 0.00019990866674170983,
+      "loss": 46.0,
+      "step": 172
+    },
+    {
+      "epoch": 0.049573751701411276,
+      "grad_norm": 0.0007721466827206314,
+      "learning_rate": 0.00019990737602758863,
+      "loss": 46.0,
+      "step": 173
+    },
+    {
+      "epoch": 0.04986030517945412,
+      "grad_norm": 0.0012437815312296152,
+      "learning_rate": 0.0001999060762614386,
+      "loss": 46.0,
+      "step": 174
+    },
+    {
+      "epoch": 0.050146858657496955,
+      "grad_norm": 0.0005417719949036837,
+      "learning_rate": 0.00019990476744337753,
+      "loss": 46.0,
+      "step": 175
+    },
+    {
+      "epoch": 0.0504334121355398,
+      "grad_norm": 0.000531162484548986,
+      "learning_rate": 0.00019990344957352397,
+      "loss": 46.0,
+      "step": 176
+    },
+    {
+      "epoch": 0.050719965613582635,
+      "grad_norm": 0.0010336303384974599,
+      "learning_rate": 0.00019990212265199738,
+      "loss": 46.0,
+      "step": 177
+    },
+    {
+      "epoch": 0.05100651909162547,
+      "grad_norm": 0.0006930266390554607,
+      "learning_rate": 0.00019990078667891792,
+      "loss": 46.0,
+      "step": 178
+    },
+    {
+      "epoch": 0.051293072569668315,
+      "grad_norm": 0.0005671161925420165,
+      "learning_rate": 0.00019989944165440667,
+      "loss": 46.0,
+      "step": 179
+    },
+    {
+      "epoch": 0.05157962604771115,
+      "grad_norm": 0.0005927831516601145,
+      "learning_rate": 0.00019989808757858547,
+      "loss": 46.0,
+      "step": 180
+    },
+    {
+      "epoch": 0.051866179525753994,
+      "grad_norm": 0.0014029775047674775,
+      "learning_rate": 0.00019989672445157707,
+      "loss": 46.0,
+      "step": 181
+    },
+    {
+      "epoch": 0.05215273300379683,
+      "grad_norm": 0.0005633488181047142,
+      "learning_rate": 0.0001998953522735049,
+      "loss": 46.0,
+      "step": 182
+    },
+    {
+      "epoch": 0.052439286481839674,
+      "grad_norm": 0.0008726881351321936,
+      "learning_rate": 0.00019989397104449327,
+      "loss": 46.0,
+      "step": 183
+    },
+    {
+      "epoch": 0.05272583995988251,
+      "grad_norm": 0.0010581790702417493,
+      "learning_rate": 0.00019989258076466744,
+      "loss": 46.0,
+      "step": 184
+    },
+    {
+      "epoch": 0.053012393437925354,
+      "grad_norm": 0.001107684918679297,
+      "learning_rate": 0.00019989118143415327,
+      "loss": 46.0,
+      "step": 185
+    },
+    {
+      "epoch": 0.05329894691596819,
+      "grad_norm": 0.0003141145862173289,
+      "learning_rate": 0.00019988977305307758,
+      "loss": 46.0,
+      "step": 186
+    },
+    {
+      "epoch": 0.05358550039401103,
+      "grad_norm": 0.0005039018578827381,
+      "learning_rate": 0.00019988835562156798,
+      "loss": 46.0,
+      "step": 187
+    },
+    {
+      "epoch": 0.05387205387205387,
+      "grad_norm": 0.0005364140379242599,
+      "learning_rate": 0.00019988692913975288,
+      "loss": 46.0,
+      "step": 188
+    },
+    {
+      "epoch": 0.05415860735009671,
+      "grad_norm": 0.0011345255188643932,
+      "learning_rate": 0.00019988549360776153,
+      "loss": 46.0,
+      "step": 189
+    },
+    {
+      "epoch": 0.05444516082813955,
+      "grad_norm": 0.0007594748749397695,
+      "learning_rate": 0.00019988404902572402,
+      "loss": 46.0,
+      "step": 190
+    },
+    {
+      "epoch": 0.05473171430618239,
+      "grad_norm": 0.0011707482626661658,
+      "learning_rate": 0.0001998825953937712,
+      "loss": 46.0,
+      "step": 191
+    },
+    {
+      "epoch": 0.05501826778422523,
+      "grad_norm": 0.0010088670533150434,
+      "learning_rate": 0.0001998811327120348,
+      "loss": 46.0,
+      "step": 192
+    },
+    {
+      "epoch": 0.05530482126226807,
+      "grad_norm": 0.0010642173001542687,
+      "learning_rate": 0.00019987966098064733,
+      "loss": 46.0,
+      "step": 193
+    },
+    {
+      "epoch": 0.05559137474031091,
+      "grad_norm": 0.0011455731000751257,
+      "learning_rate": 0.00019987818019974216,
+      "loss": 46.0,
+      "step": 194
+    },
+    {
+      "epoch": 0.05587792821835375,
+      "grad_norm": 0.0006063128239475191,
+      "learning_rate": 0.00019987669036945343,
+      "loss": 46.0,
+      "step": 195
+    },
+    {
+      "epoch": 0.05616448169639659,
+      "grad_norm": 0.00092441460583359,
+      "learning_rate": 0.00019987519148991612,
+      "loss": 46.0,
+      "step": 196
+    },
+    {
+      "epoch": 0.05645103517443943,
+      "grad_norm": 0.0008765366510488093,
+      "learning_rate": 0.00019987368356126604,
+      "loss": 46.0,
+      "step": 197
+    },
+    {
+      "epoch": 0.05673758865248227,
+      "grad_norm": 0.0011625416809692979,
+      "learning_rate": 0.00019987216658363984,
+      "loss": 46.0,
+      "step": 198
+    },
+    {
+      "epoch": 0.05702414213052511,
+      "grad_norm": 0.0007807817892171443,
+      "learning_rate": 0.0001998706405571749,
+      "loss": 46.0,
+      "step": 199
+    },
+    {
+      "epoch": 0.05731069560856795,
+      "grad_norm": 0.001416051178239286,
+      "learning_rate": 0.00019986910548200955,
+      "loss": 46.0,
+      "step": 200
+    },
+    {
+      "epoch": 0.05731069560856795,
+      "eval_loss": 11.5,
+      "eval_runtime": 1.0416,
+      "eval_samples_per_second": 135.365,
+      "eval_steps_per_second": 68.162,
+      "step": 200
+    }
+  ],
+  "logging_steps": 1,
+  "max_steps": 10467,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 3,
+  "save_steps": 200,
+  "stateful_callbacks": {
+    "EarlyStoppingCallback": {
+      "args": {
+        "early_stopping_patience": 3,
+        "early_stopping_threshold": 0.0
+      },
+      "attributes": {
+        "early_stopping_patience_counter": 0
+      }
+    },
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 1999837593600.0,
+  "train_batch_size": 2,
+  "trial_name": null,
+  "trial_params": null
+}

last-checkpoint/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:23d39dd8e26d1ac93b0c23933cdca1cb83315b06d440ce30d09da1f97075ba9b
+size 6776

last-checkpoint/vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff