TimLukaHorstmann commited on 6 days ago

Commit

1 Parent(s): 20798f3

Updated and cleaned model + inference code + colab

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

README.md +88 -58
added_tokens.json +0 -24
bigscience_T0_3B_ssml/added_tokens.json +0 -105
bigscience_T0_3B_ssml/checkpoint-12/added_tokens.json +0 -105
bigscience_T0_3B_ssml/checkpoint-12/config.json +0 -32
bigscience_T0_3B_ssml/checkpoint-12/generation_config.json +0 -7
bigscience_T0_3B_ssml/checkpoint-12/model.safetensors.index.json +0 -565
bigscience_T0_3B_ssml/checkpoint-12/special_tokens_map.json +0 -125
bigscience_T0_3B_ssml/checkpoint-12/tokenizer_config.json +0 -965
bigscience_T0_3B_ssml/checkpoint-12/trainer_state.json +0 -41
bigscience_T0_3B_ssml/config.json +0 -32
bigscience_T0_3B_ssml/generation_config.json +0 -7
bigscience_T0_3B_ssml/model.safetensors.index.json +0 -565
bigscience_T0_3B_ssml/special_tokens_map.json +0 -125
bigscience_T0_3B_ssml/tokenizer_config.json +0 -965
chat_template.jinja +0 -54
checkpoint-735/README.md +0 -202
checkpoint-735/adapter_config.json +0 -39
checkpoint-735/added_tokens.json +0 -24
checkpoint-735/chat_template.jinja +0 -54
checkpoint-735/merges.txt +0 -0
checkpoint-735/special_tokens_map.json +0 -31
checkpoint-735/tokenizer_config.json +0 -207
checkpoint-735/trainer_state.json +0 -286
checkpoint-735/vocab.json +0 -0
facebook_bart-base_ssml/added_tokens.json +0 -6
facebook_bart-base_ssml/checkpoint-120/added_tokens.json +0 -6
facebook_bart-base_ssml/checkpoint-120/config.json +0 -73
facebook_bart-base_ssml/checkpoint-120/generation_config.json +0 -13
facebook_bart-base_ssml/checkpoint-120/merges.txt +0 -0
facebook_bart-base_ssml/checkpoint-120/special_tokens_map.json +0 -51
facebook_bart-base_ssml/checkpoint-120/tokenizer_config.json +0 -89
facebook_bart-base_ssml/checkpoint-120/trainer_state.json +0 -41
facebook_bart-base_ssml/checkpoint-120/vocab.json +0 -0
facebook_bart-base_ssml/checkpoint-3/added_tokens.json +0 -5
facebook_bart-base_ssml/checkpoint-3/config.json +0 -73
facebook_bart-base_ssml/checkpoint-3/generation_config.json +0 -13
facebook_bart-base_ssml/checkpoint-3/merges.txt +0 -0
facebook_bart-base_ssml/checkpoint-3/special_tokens_map.json +0 -51
facebook_bart-base_ssml/checkpoint-3/tokenizer_config.json +0 -81
facebook_bart-base_ssml/checkpoint-3/trainer_state.json +0 -33
facebook_bart-base_ssml/checkpoint-3/vocab.json +0 -0
facebook_bart-base_ssml/checkpoint-990/added_tokens.json +0 -6
facebook_bart-base_ssml/checkpoint-990/config.json +0 -73
facebook_bart-base_ssml/checkpoint-990/generation_config.json +0 -13
facebook_bart-base_ssml/checkpoint-990/merges.txt +0 -0
facebook_bart-base_ssml/checkpoint-990/special_tokens_map.json +0 -51
facebook_bart-base_ssml/checkpoint-990/tokenizer_config.json +0 -89
facebook_bart-base_ssml/checkpoint-990/trainer_state.json +0 -33
facebook_bart-base_ssml/checkpoint-990/vocab.json +0 -0

README.md CHANGED Viewed

@@ -2,91 +2,76 @@
 license: apache-2.0
 base_model: Qwen/Qwen2.5-7B
 library_name: peft
 tags:
 - text-to-speech
-- ssml
-- qwen2.5
 - lora
 - peft
-language:
-- en
-- fr
 pipeline_tag: text-generation
 ---
-# 🗣️ ssml-text2breaks-fr-lora
-**ssml-text2breaks-fr-lora** is a LoRA adapter built on top of `Qwen/Qwen2.5-7B`, trained to predict **symbolic pause markers** (e.g., `#250`, `#500`) in raw French text. These symbolic tags indicate appropriate prosodic boundaries for speech synthesis systems.
-This model is the **first stage** in the cascaded pipeline presented in:
-> **"Improving French Synthetic Speech Quality via SSML Prosody Control"**
-> *Nassima Ould-Ouali, Éric Moulines* – ICNLSP 2025 (*Springer LNCS*, accepted)
-It is designed to be followed by [`ssml-break2ssml-fr-lora`](https://huggingface.co/nassimaODL/ssml-break2ssml-fr-lora), which converts symbolic markers into valid SSML tags.
----
 ## 🧩 Pipeline Overview
-| Stage | Model Name | Description |
-|-------|------------|-------------|
-| 1️⃣    | `ssml-text2breaks-fr-lora` | Predicts symbolic pause markers such as `#250`, `#500` |
-| 2️⃣    | `ssml-break2ssml-fr-lora` | Converts symbolic markers into `<break time="..."/>` SSML tags |
----
 ## ✨ Example
 **Input:**
-```text
-Bonjour je m'appelle Bertrand Perier. Je suis avocat à la cour.
 ```
-**Output**
-```text
-Bonjour#250 je m'appelle Bertrand Perier.#500 Je suis avocat à la cour.
 ```
-## 🧠 Model Details
-- **Base Model**: Qwen/Qwen2.5-7B
-- **Fine-tuning Method**: LoRA (Low-Rank Adaptation)
-- **LoRA Rank**: 8
-- **LoRA Alpha**: 16
-- **Target Modules**: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
-- **Training Epochs**: 5
-- **Batch Size**: 1 (with gradient accumulation)
-- **Learning Rate**: 3e-4
-## 🚀 How to run the code
 ```python
-import torch
 from transformers import AutoTokenizer, AutoModelForCausalLM
 from peft import PeftModel
 # Load base model and tokenizer
 base_model = AutoModelForCausalLM.from_pretrained(
     "Qwen/Qwen2.5-7B",
-    torch_dtype=torch.bfloat16,
     device_map="auto"
 )
 tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-7B")
 # Load LoRA adapter
-model = PeftModel.from_pretrained(base_model, "jonahdvt/qwen-ssml-lora")
 # Prepare input
-instruction = "Convert text to SSML with pauses:"
-text = "Hello, how are you today? I hope everything is going well."
-formatted_input = f"### Task:\n{instruction}\n\n### Text:\n{text}\n\n### SSML:\n"
 # Generate
 inputs = tokenizer(formatted_input, return_tensors="pt").to(model.device)
@@ -100,25 +85,70 @@ with torch.no_grad():
     )
 response = tokenizer.decode(outputs[0], skip_special_tokens=True)
-ssml_output = response.split("### SSML:\n")[-1]
-print(ssml_output)
 ```
-## Citation
-If you use this model in your research, please cite:
-```text
 @inproceedings{ould-ouali2025_improving,
   title     = {Improving Synthetic Speech Quality via SSML Prosody Control},
   author    = {Ould-Ouali, Nassima and Sani, Awais and Bueno, Ruben and Dauvet, Jonah and Horstmann, Tim Luka and Moulines, Eric},
-  booktitle = {Proceedings of the 8th International Conference on Natural Language and Speech Processing (ICNLSP)}, % TODO: vérifier l'intitulé exact utilisé par la conf
   year      = {2025},
-  pages     = {XX--YY},   % TODO
-  publisher = {—},        % TODO
-  address   = {—}         % TODO
 }
 ```
-## License
-This model is released under the Apache 2.0 license, same as the base Qwen2.5-7B model.

 license: apache-2.0
 base_model: Qwen/Qwen2.5-7B
 library_name: peft
+language:
+- fr
 tags:
 - text-to-speech
 - lora
 - peft
+- ssml
+- qwen2.5
 pipeline_tag: text-generation
 ---
+# 🗣️ French Text-to-Breaks LoRA Model
+**hi-paris/ssml-text2breaks-fr-lora** is a LoRA adapter fine-tuned on Qwen2.5-7B to predict natural pause locations in French text by adding symbolic `<break/>` markers.
+This is the **first stage** of a two-step SSML cascade pipeline for improving French text-to-speech prosody control.
+> 📄 **Paper**: *"Improving Synthetic Speech Quality via SSML Prosody Control"*
+> **Authors**: Nassima Ould-Ouali, Awais Sani, Ruben Bueno, Jonah Dauvet, Tim Luka Horstmann, Eric Moulines
+> **Conference**: ICNLSP 2025
+> 🔗 **Demo & Audio Samples**: https://horstmann.tech/ssml-prosody-control/
 ## 🧩 Pipeline Overview
+| Stage | Model | Purpose |
+|-------|-------|---------|
+| 1️⃣ | **hi-paris/ssml-text2breaks-fr-lora** | Predicts natural pause locations |
+| 2️⃣ | [hi-paris/ssml-breaks2ssml-fr-lora](https://huggingface.co/hi-paris/ssml-breaks2ssml-fr-lora) | Converts breaks to full SSML with prosody |
 ## ✨ Example
 **Input:**
 ```
+Bonjour comment allez-vous aujourd'hui ?
 ```
+**Output:**
+```
+Bonjour comment allez-vous aujourd'hui ?<break/>
+```
+## 🚀 Quick Start
+### Installation
+```bash
+pip install torch transformers peft accelerate
+```
+### Basic Usage
 ```python
 from transformers import AutoTokenizer, AutoModelForCausalLM
 from peft import PeftModel
+import torch
 # Load base model and tokenizer
 base_model = AutoModelForCausalLM.from_pretrained(
     "Qwen/Qwen2.5-7B",
+    torch_dtype=torch.float16,
     device_map="auto"
 )
 tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-7B")
 # Load LoRA adapter
+model = PeftModel.from_pretrained(base_model, "hi-paris/ssml-text2breaks-fr-lora")
 # Prepare input
+text = "Bonjour comment allez-vous aujourd'hui ?"
+formatted_input = f"### Task:\nConvert text to SSML with pauses:\n\n### Text:\n{text}\n\n### SSML:\n"
 # Generate
 inputs = tokenizer(formatted_input, return_tensors="pt").to(model.device)
     )
 response = tokenizer.decode(outputs[0], skip_special_tokens=True)
+result = response.split("### SSML:\n")[-1].strip()
+print(result)  # "Bonjour comment allez-vous aujourd'hui ?<break/>"
+```
+### Production Usage (Recommended)
+For production use with memory optimization and full cascade, see our [inference repository](https://github.com/TimLukaHorstmann/cascading_model):
+```python
+from text2breaks_inference import Text2BreaksInference
+# Memory-efficient shared model approach
+model = Text2BreaksInference()
+result = model.predict("Bonjour comment allez-vous aujourd'hui ?")
 ```
+## 🔧 Full Cascade Example
+```python
+from breaks2ssml_inference import CascadedInference
+# Initialize full pipeline (memory efficient)
+cascade = CascadedInference()
+# Convert plain text directly to full SSML
+text = "Bonjour comment allez-vous aujourd'hui ?"
+ssml_output = cascade.predict(text)
+print(ssml_output)
+# Output: '<prosody pitch="+2.5%" rate="-1.2%" volume="-5.0%">Bonjour comment allez-vous aujourd'hui ?</prosody><break time="300ms"/>'
+```
+## 🧠 Model Details
+- **Base Model**: [Qwen/Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B)
+- **Fine-tuning Method**: LoRA (Low-Rank Adaptation)
+- **LoRA Rank**: 8, Alpha: 16
+- **Target Modules**: `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`
+- **Training**: 5 epochs, batch size 1 with gradient accumulation
+- **Language**: French
+- **Model Size**: 7B parameters (LoRA adapter: ~81MB)
+- **License**: Apache 2.0
+## 📊 Performance
+The model achieves high accuracy in predicting natural pause locations in French text, contributing to improved prosody in text-to-speech synthesis when combined with the second-stage model.
+## 🔗 Resources
+- **Full Pipeline Code**: https://github.com/TimLukaHorstmann/cascading_model
+- **Interactive Demo**: [Colab Notebook](https://colab.research.google.com/drive/1bFcbJQY9OuY0_zlscqkf9PIgd3dUrIKs?usp=sharing)
+- **Stage 2 Model**: [hi-paris/ssml-breaks2ssml-fr-lora](https://huggingface.co/hi-paris/ssml-breaks2ssml-fr-lora)
+## 📖 Citation
+```bibtex
 @inproceedings{ould-ouali2025_improving,
   title     = {Improving Synthetic Speech Quality via SSML Prosody Control},
   author    = {Ould-Ouali, Nassima and Sani, Awais and Bueno, Ruben and Dauvet, Jonah and Horstmann, Tim Luka and Moulines, Eric},
+  booktitle = {Proceedings of the 8th International Conference on Natural Language and Speech Processing (ICNLSP)},
   year      = {2025},
+  url       = {https://huggingface.co/hi-paris}
 }
 ```
+## 📜 License
+Apache 2.0 License (same as the base Qwen2.5-7B model)

added_tokens.json DELETED Viewed

@@ -1,24 +0,0 @@
-{
-  "</tool_call>": 151658,
-  "<tool_call>": 151657,
-  "<|box_end|>": 151649,
-  "<|box_start|>": 151648,
-  "<|endoftext|>": 151643,
-  "<|file_sep|>": 151664,
-  "<|fim_middle|>": 151660,
-  "<|fim_pad|>": 151662,
-  "<|fim_prefix|>": 151659,
-  "<|fim_suffix|>": 151661,
-  "<|im_end|>": 151645,
-  "<|im_start|>": 151644,
-  "<|image_pad|>": 151655,
-  "<|object_ref_end|>": 151647,
-  "<|object_ref_start|>": 151646,
-  "<|quad_end|>": 151651,
-  "<|quad_start|>": 151650,
-  "<|repo_name|>": 151663,
-  "<|video_pad|>": 151656,
-  "<|vision_end|>": 151653,
-  "<|vision_pad|>": 151654,
-  "<|vision_start|>": 151652
-}

bigscience_T0_3B_ssml/added_tokens.json DELETED Viewed

@@ -1,105 +0,0 @@
-{
-  "</prosody>": 32101,
-  "<break/>": 32102,
-  "<extra_id_0>": 32099,
-  "<extra_id_10>": 32089,
-  "<extra_id_11>": 32088,
-  "<extra_id_12>": 32087,
-  "<extra_id_13>": 32086,
-  "<extra_id_14>": 32085,
-  "<extra_id_15>": 32084,
-  "<extra_id_16>": 32083,
-  "<extra_id_17>": 32082,
-  "<extra_id_18>": 32081,
-  "<extra_id_19>": 32080,
-  "<extra_id_1>": 32098,
-  "<extra_id_20>": 32079,
-  "<extra_id_21>": 32078,
-  "<extra_id_22>": 32077,
-  "<extra_id_23>": 32076,
-  "<extra_id_24>": 32075,
-  "<extra_id_25>": 32074,
-  "<extra_id_26>": 32073,
-  "<extra_id_27>": 32072,
-  "<extra_id_28>": 32071,
-  "<extra_id_29>": 32070,
-  "<extra_id_2>": 32097,
-  "<extra_id_30>": 32069,
-  "<extra_id_31>": 32068,
-  "<extra_id_32>": 32067,
-  "<extra_id_33>": 32066,
-  "<extra_id_34>": 32065,
-  "<extra_id_35>": 32064,
-  "<extra_id_36>": 32063,
-  "<extra_id_37>": 32062,
-  "<extra_id_38>": 32061,
-  "<extra_id_39>": 32060,
-  "<extra_id_3>": 32096,
-  "<extra_id_40>": 32059,
-  "<extra_id_41>": 32058,
-  "<extra_id_42>": 32057,
-  "<extra_id_43>": 32056,
-  "<extra_id_44>": 32055,
-  "<extra_id_45>": 32054,
-  "<extra_id_46>": 32053,
-  "<extra_id_47>": 32052,
-  "<extra_id_48>": 32051,
-  "<extra_id_49>": 32050,
-  "<extra_id_4>": 32095,
-  "<extra_id_50>": 32049,
-  "<extra_id_51>": 32048,
-  "<extra_id_52>": 32047,
-  "<extra_id_53>": 32046,
-  "<extra_id_54>": 32045,
-  "<extra_id_55>": 32044,
-  "<extra_id_56>": 32043,
-  "<extra_id_57>": 32042,
-  "<extra_id_58>": 32041,
-  "<extra_id_59>": 32040,
-  "<extra_id_5>": 32094,
-  "<extra_id_60>": 32039,
-  "<extra_id_61>": 32038,
-  "<extra_id_62>": 32037,
-  "<extra_id_63>": 32036,
-  "<extra_id_64>": 32035,
-  "<extra_id_65>": 32034,
-  "<extra_id_66>": 32033,
-  "<extra_id_67>": 32032,
-  "<extra_id_68>": 32031,
-  "<extra_id_69>": 32030,
-  "<extra_id_6>": 32093,
-  "<extra_id_70>": 32029,
-  "<extra_id_71>": 32028,
-  "<extra_id_72>": 32027,
-  "<extra_id_73>": 32026,
-  "<extra_id_74>": 32025,
-  "<extra_id_75>": 32024,
-  "<extra_id_76>": 32023,
-  "<extra_id_77>": 32022,
-  "<extra_id_78>": 32021,
-  "<extra_id_79>": 32020,
-  "<extra_id_7>": 32092,
-  "<extra_id_80>": 32019,
-  "<extra_id_81>": 32018,
-  "<extra_id_82>": 32017,
-  "<extra_id_83>": 32016,
-  "<extra_id_84>": 32015,
-  "<extra_id_85>": 32014,
-  "<extra_id_86>": 32013,
-  "<extra_id_87>": 32012,
-  "<extra_id_88>": 32011,
-  "<extra_id_89>": 32010,
-  "<extra_id_8>": 32091,
-  "<extra_id_90>": 32009,
-  "<extra_id_91>": 32008,
-  "<extra_id_92>": 32007,
-  "<extra_id_93>": 32006,
-  "<extra_id_94>": 32005,
-  "<extra_id_95>": 32004,
-  "<extra_id_96>": 32003,
-  "<extra_id_97>": 32002,
-  "<extra_id_98>": 32001,
-  "<extra_id_99>": 32000,
-  "<extra_id_9>": 32090,
-  "<prosody>": 32100
-}

bigscience_T0_3B_ssml/checkpoint-12/added_tokens.json DELETED Viewed

@@ -1,105 +0,0 @@
-{
-  "</prosody>": 32101,
-  "<break/>": 32102,
-  "<extra_id_0>": 32099,
-  "<extra_id_10>": 32089,
-  "<extra_id_11>": 32088,
-  "<extra_id_12>": 32087,
-  "<extra_id_13>": 32086,
-  "<extra_id_14>": 32085,
-  "<extra_id_15>": 32084,
-  "<extra_id_16>": 32083,
-  "<extra_id_17>": 32082,
-  "<extra_id_18>": 32081,
-  "<extra_id_19>": 32080,
-  "<extra_id_1>": 32098,
-  "<extra_id_20>": 32079,
-  "<extra_id_21>": 32078,
-  "<extra_id_22>": 32077,
-  "<extra_id_23>": 32076,
-  "<extra_id_24>": 32075,
-  "<extra_id_25>": 32074,
-  "<extra_id_26>": 32073,
-  "<extra_id_27>": 32072,
-  "<extra_id_28>": 32071,
-  "<extra_id_29>": 32070,
-  "<extra_id_2>": 32097,
-  "<extra_id_30>": 32069,
-  "<extra_id_31>": 32068,
-  "<extra_id_32>": 32067,
-  "<extra_id_33>": 32066,
-  "<extra_id_34>": 32065,
-  "<extra_id_35>": 32064,
-  "<extra_id_36>": 32063,
-  "<extra_id_37>": 32062,
-  "<extra_id_38>": 32061,
-  "<extra_id_39>": 32060,
-  "<extra_id_3>": 32096,
-  "<extra_id_40>": 32059,
-  "<extra_id_41>": 32058,
-  "<extra_id_42>": 32057,
-  "<extra_id_43>": 32056,
-  "<extra_id_44>": 32055,
-  "<extra_id_45>": 32054,
-  "<extra_id_46>": 32053,
-  "<extra_id_47>": 32052,
-  "<extra_id_48>": 32051,
-  "<extra_id_49>": 32050,
-  "<extra_id_4>": 32095,
-  "<extra_id_50>": 32049,
-  "<extra_id_51>": 32048,
-  "<extra_id_52>": 32047,
-  "<extra_id_53>": 32046,
-  "<extra_id_54>": 32045,
-  "<extra_id_55>": 32044,
-  "<extra_id_56>": 32043,
-  "<extra_id_57>": 32042,
-  "<extra_id_58>": 32041,
-  "<extra_id_59>": 32040,
-  "<extra_id_5>": 32094,
-  "<extra_id_60>": 32039,
-  "<extra_id_61>": 32038,
-  "<extra_id_62>": 32037,
-  "<extra_id_63>": 32036,
-  "<extra_id_64>": 32035,
-  "<extra_id_65>": 32034,
-  "<extra_id_66>": 32033,
-  "<extra_id_67>": 32032,
-  "<extra_id_68>": 32031,
-  "<extra_id_69>": 32030,
-  "<extra_id_6>": 32093,
-  "<extra_id_70>": 32029,
-  "<extra_id_71>": 32028,
-  "<extra_id_72>": 32027,
-  "<extra_id_73>": 32026,
-  "<extra_id_74>": 32025,
-  "<extra_id_75>": 32024,
-  "<extra_id_76>": 32023,
-  "<extra_id_77>": 32022,
-  "<extra_id_78>": 32021,
-  "<extra_id_79>": 32020,
-  "<extra_id_7>": 32092,
-  "<extra_id_80>": 32019,
-  "<extra_id_81>": 32018,
-  "<extra_id_82>": 32017,
-  "<extra_id_83>": 32016,
-  "<extra_id_84>": 32015,
-  "<extra_id_85>": 32014,
-  "<extra_id_86>": 32013,
-  "<extra_id_87>": 32012,
-  "<extra_id_88>": 32011,
-  "<extra_id_89>": 32010,
-  "<extra_id_8>": 32091,
-  "<extra_id_90>": 32009,
-  "<extra_id_91>": 32008,
-  "<extra_id_92>": 32007,
-  "<extra_id_93>": 32006,
-  "<extra_id_94>": 32005,
-  "<extra_id_95>": 32004,
-  "<extra_id_96>": 32003,
-  "<extra_id_97>": 32002,
-  "<extra_id_98>": 32001,
-  "<extra_id_99>": 32000,
-  "<extra_id_9>": 32090,
-  "<prosody>": 32100
-}

bigscience_T0_3B_ssml/checkpoint-12/config.json DELETED Viewed

@@ -1,32 +0,0 @@
-{
-  "architectures": [
-    "T5ForConditionalGeneration"
-  ],
-  "classifier_dropout": 0.0,
-  "d_ff": 5120,
-  "d_kv": 64,
-  "d_model": 2048,
-  "decoder_start_token_id": 0,
-  "dense_act_fn": "gelu_new",
-  "dropout_rate": 0.1,
-  "eos_token_id": 1,
-  "feed_forward_proj": "gated-gelu",
-  "gradient_checkpointing": false,
-  "initializer_factor": 1.0,
-  "is_encoder_decoder": true,
-  "is_gated_act": true,
-  "layer_norm_epsilon": 1e-06,
-  "model_type": "t5",
-  "num_decoder_layers": 24,
-  "num_heads": 32,
-  "num_layers": 24,
-  "output_past": true,
-  "pad_token_id": 0,
-  "relative_attention_max_distance": 128,
-  "relative_attention_num_buckets": 32,
-  "tie_word_embeddings": false,
-  "torch_dtype": "float32",
-  "transformers_version": "4.52.2",
-  "use_cache": true,
-  "vocab_size": 32103
-}

bigscience_T0_3B_ssml/checkpoint-12/generation_config.json DELETED Viewed

@@ -1,7 +0,0 @@
-{
-  "_from_model_config": true,
-  "decoder_start_token_id": 0,
-  "eos_token_id": 1,
-  "pad_token_id": 0,
-  "transformers_version": "4.52.2"
-}

bigscience_T0_3B_ssml/checkpoint-12/model.safetensors.index.json DELETED Viewed

@@ -1,565 +0,0 @@
-{
-  "metadata": {
-    "total_size": 11398619136
-  },
-  "weight_map": {
-    "decoder.block.0.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
-    "decoder.block.0.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
-    "decoder.block.0.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
-    "decoder.block.0.layer.0.SelfAttention.relative_attention_bias.weight": "model-00001-of-00003.safetensors",
-    "decoder.block.0.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
-    "decoder.block.0.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "decoder.block.0.layer.1.EncDecAttention.k.weight": "model-00001-of-00003.safetensors",
-    "decoder.block.0.layer.1.EncDecAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.0.layer.1.EncDecAttention.q.weight": "model-00001-of-00003.safetensors",
-    "decoder.block.0.layer.1.EncDecAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.0.layer.1.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.0.layer.2.DenseReluDense.wi_0.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.0.layer.2.DenseReluDense.wi_1.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.0.layer.2.DenseReluDense.wo.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.0.layer.2.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.1.layer.0.SelfAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.1.layer.0.SelfAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.1.layer.0.SelfAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.1.layer.0.SelfAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.1.layer.0.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.1.layer.1.EncDecAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.1.layer.1.EncDecAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.1.layer.1.EncDecAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.1.layer.1.EncDecAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.1.layer.1.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.1.layer.2.DenseReluDense.wi_0.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.1.layer.2.DenseReluDense.wi_1.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.1.layer.2.DenseReluDense.wo.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.1.layer.2.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.10.layer.0.SelfAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.10.layer.0.SelfAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.10.layer.0.SelfAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.10.layer.0.SelfAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.10.layer.0.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.10.layer.1.EncDecAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.10.layer.1.EncDecAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.10.layer.1.EncDecAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.10.layer.1.EncDecAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.10.layer.1.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.10.layer.2.DenseReluDense.wi_0.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.10.layer.2.DenseReluDense.wi_1.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.10.layer.2.DenseReluDense.wo.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.10.layer.2.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.11.layer.0.SelfAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.11.layer.0.SelfAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.11.layer.0.SelfAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.11.layer.0.SelfAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.11.layer.0.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.11.layer.1.EncDecAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.11.layer.1.EncDecAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.11.layer.1.EncDecAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.11.layer.1.EncDecAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.11.layer.1.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.11.layer.2.DenseReluDense.wi_0.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.11.layer.2.DenseReluDense.wi_1.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.11.layer.2.DenseReluDense.wo.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.11.layer.2.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.12.layer.0.SelfAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.12.layer.0.SelfAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.12.layer.0.SelfAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.12.layer.0.SelfAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.12.layer.0.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.12.layer.1.EncDecAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.12.layer.1.EncDecAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.12.layer.1.EncDecAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.12.layer.1.EncDecAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.12.layer.1.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.12.layer.2.DenseReluDense.wi_0.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.12.layer.2.DenseReluDense.wi_1.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.12.layer.2.DenseReluDense.wo.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.12.layer.2.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.13.layer.0.SelfAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.13.layer.0.SelfAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.13.layer.0.SelfAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.13.layer.0.SelfAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.13.layer.0.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.13.layer.1.EncDecAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.13.layer.1.EncDecAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.13.layer.1.EncDecAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.13.layer.1.EncDecAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.13.layer.1.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.13.layer.2.DenseReluDense.wi_0.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.13.layer.2.DenseReluDense.wi_1.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.13.layer.2.DenseReluDense.wo.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.13.layer.2.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.14.layer.0.SelfAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.14.layer.0.SelfAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.14.layer.0.SelfAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.14.layer.0.SelfAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.14.layer.0.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.14.layer.1.EncDecAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.14.layer.1.EncDecAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.14.layer.1.EncDecAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.14.layer.1.EncDecAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.14.layer.1.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.14.layer.2.DenseReluDense.wi_0.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.14.layer.2.DenseReluDense.wi_1.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.14.layer.2.DenseReluDense.wo.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.14.layer.2.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.15.layer.0.SelfAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.15.layer.0.SelfAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.15.layer.0.SelfAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.15.layer.0.SelfAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.15.layer.0.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.15.layer.1.EncDecAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.15.layer.1.EncDecAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.15.layer.1.EncDecAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.15.layer.1.EncDecAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.15.layer.1.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.15.layer.2.DenseReluDense.wi_0.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.15.layer.2.DenseReluDense.wi_1.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.15.layer.2.DenseReluDense.wo.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.15.layer.2.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.16.layer.0.SelfAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.16.layer.0.SelfAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.16.layer.0.SelfAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.16.layer.0.SelfAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.16.layer.0.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.16.layer.1.EncDecAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.16.layer.1.EncDecAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.16.layer.1.EncDecAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.16.layer.1.EncDecAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.16.layer.1.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.16.layer.2.DenseReluDense.wi_0.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.16.layer.2.DenseReluDense.wi_1.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.16.layer.2.DenseReluDense.wo.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.16.layer.2.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.17.layer.0.SelfAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.17.layer.0.SelfAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.17.layer.0.SelfAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.17.layer.0.SelfAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.17.layer.0.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.17.layer.1.EncDecAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.17.layer.1.EncDecAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.17.layer.1.EncDecAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.17.layer.1.EncDecAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.17.layer.1.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.17.layer.2.DenseReluDense.wi_0.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.17.layer.2.DenseReluDense.wi_1.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.17.layer.2.DenseReluDense.wo.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.17.layer.2.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.18.layer.0.SelfAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.18.layer.0.SelfAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.18.layer.0.SelfAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.18.layer.0.SelfAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.18.layer.0.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.18.layer.1.EncDecAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.18.layer.1.EncDecAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.18.layer.1.EncDecAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.18.layer.1.EncDecAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.18.layer.1.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.18.layer.2.DenseReluDense.wi_0.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.18.layer.2.DenseReluDense.wi_1.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.18.layer.2.DenseReluDense.wo.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.18.layer.2.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.19.layer.0.SelfAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.19.layer.0.SelfAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.19.layer.0.SelfAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.19.layer.0.SelfAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.19.layer.0.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.19.layer.1.EncDecAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.19.layer.1.EncDecAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.19.layer.1.EncDecAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.19.layer.1.EncDecAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.19.layer.1.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.19.layer.2.DenseReluDense.wi_0.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.19.layer.2.DenseReluDense.wi_1.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.19.layer.2.DenseReluDense.wo.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.19.layer.2.layer_norm.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.2.layer.0.SelfAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.2.layer.0.SelfAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.2.layer.0.SelfAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.2.layer.0.SelfAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.2.layer.0.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.2.layer.1.EncDecAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.2.layer.1.EncDecAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.2.layer.1.EncDecAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.2.layer.1.EncDecAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.2.layer.1.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.2.layer.2.DenseReluDense.wi_0.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.2.layer.2.DenseReluDense.wi_1.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.2.layer.2.DenseReluDense.wo.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.2.layer.2.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.20.layer.0.SelfAttention.k.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.20.layer.0.SelfAttention.o.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.20.layer.0.SelfAttention.q.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.20.layer.0.SelfAttention.v.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.20.layer.0.layer_norm.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.20.layer.1.EncDecAttention.k.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.20.layer.1.EncDecAttention.o.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.20.layer.1.EncDecAttention.q.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.20.layer.1.EncDecAttention.v.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.20.layer.1.layer_norm.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.20.layer.2.DenseReluDense.wi_0.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.20.layer.2.DenseReluDense.wi_1.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.20.layer.2.DenseReluDense.wo.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.20.layer.2.layer_norm.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.21.layer.0.SelfAttention.k.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.21.layer.0.SelfAttention.o.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.21.layer.0.SelfAttention.q.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.21.layer.0.SelfAttention.v.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.21.layer.0.layer_norm.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.21.layer.1.EncDecAttention.k.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.21.layer.1.EncDecAttention.o.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.21.layer.1.EncDecAttention.q.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.21.layer.1.EncDecAttention.v.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.21.layer.1.layer_norm.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.21.layer.2.DenseReluDense.wi_0.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.21.layer.2.DenseReluDense.wi_1.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.21.layer.2.DenseReluDense.wo.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.21.layer.2.layer_norm.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.22.layer.0.SelfAttention.k.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.22.layer.0.SelfAttention.o.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.22.layer.0.SelfAttention.q.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.22.layer.0.SelfAttention.v.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.22.layer.0.layer_norm.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.22.layer.1.EncDecAttention.k.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.22.layer.1.EncDecAttention.o.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.22.layer.1.EncDecAttention.q.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.22.layer.1.EncDecAttention.v.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.22.layer.1.layer_norm.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.22.layer.2.DenseReluDense.wi_0.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.22.layer.2.DenseReluDense.wi_1.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.22.layer.2.DenseReluDense.wo.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.22.layer.2.layer_norm.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.23.layer.0.SelfAttention.k.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.23.layer.0.SelfAttention.o.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.23.layer.0.SelfAttention.q.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.23.layer.0.SelfAttention.v.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.23.layer.0.layer_norm.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.23.layer.1.EncDecAttention.k.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.23.layer.1.EncDecAttention.o.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.23.layer.1.EncDecAttention.q.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.23.layer.1.EncDecAttention.v.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.23.layer.1.layer_norm.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.23.layer.2.DenseReluDense.wi_0.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.23.layer.2.DenseReluDense.wi_1.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.23.layer.2.DenseReluDense.wo.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.23.layer.2.layer_norm.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.3.layer.0.SelfAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.3.layer.0.SelfAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.3.layer.0.SelfAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.3.layer.0.SelfAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.3.layer.0.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.3.layer.1.EncDecAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.3.layer.1.EncDecAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.3.layer.1.EncDecAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.3.layer.1.EncDecAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.3.layer.1.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.3.layer.2.DenseReluDense.wi_0.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.3.layer.2.DenseReluDense.wi_1.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.3.layer.2.DenseReluDense.wo.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.3.layer.2.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.4.layer.0.SelfAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.4.layer.0.SelfAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.4.layer.0.SelfAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.4.layer.0.SelfAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.4.layer.0.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.4.layer.1.EncDecAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.4.layer.1.EncDecAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.4.layer.1.EncDecAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.4.layer.1.EncDecAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.4.layer.1.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.4.layer.2.DenseReluDense.wi_0.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.4.layer.2.DenseReluDense.wi_1.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.4.layer.2.DenseReluDense.wo.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.4.layer.2.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.5.layer.0.SelfAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.5.layer.0.SelfAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.5.layer.0.SelfAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.5.layer.0.SelfAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.5.layer.0.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.5.layer.1.EncDecAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.5.layer.1.EncDecAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.5.layer.1.EncDecAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.5.layer.1.EncDecAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.5.layer.1.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.5.layer.2.DenseReluDense.wi_0.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.5.layer.2.DenseReluDense.wi_1.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.5.layer.2.DenseReluDense.wo.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.5.layer.2.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.6.layer.0.SelfAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.6.layer.0.SelfAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.6.layer.0.SelfAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.6.layer.0.SelfAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.6.layer.0.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.6.layer.1.EncDecAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.6.layer.1.EncDecAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.6.layer.1.EncDecAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.6.layer.1.EncDecAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.6.layer.1.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.6.layer.2.DenseReluDense.wi_0.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.6.layer.2.DenseReluDense.wi_1.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.6.layer.2.DenseReluDense.wo.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.6.layer.2.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.7.layer.0.SelfAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.7.layer.0.SelfAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.7.layer.0.SelfAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.7.layer.0.SelfAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.7.layer.0.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.7.layer.1.EncDecAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.7.layer.1.EncDecAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.7.layer.1.EncDecAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.7.layer.1.EncDecAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.7.layer.1.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.7.layer.2.DenseReluDense.wi_0.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.7.layer.2.DenseReluDense.wi_1.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.7.layer.2.DenseReluDense.wo.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.7.layer.2.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.8.layer.0.SelfAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.8.layer.0.SelfAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.8.layer.0.SelfAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.8.layer.0.SelfAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.8.layer.0.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.8.layer.1.EncDecAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.8.layer.1.EncDecAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.8.layer.1.EncDecAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.8.layer.1.EncDecAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.8.layer.1.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.8.layer.2.DenseReluDense.wi_0.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.8.layer.2.DenseReluDense.wi_1.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.8.layer.2.DenseReluDense.wo.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.8.layer.2.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.9.layer.0.SelfAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.9.layer.0.SelfAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.9.layer.0.SelfAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.9.layer.0.SelfAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.9.layer.0.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.9.layer.1.EncDecAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.9.layer.1.EncDecAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.9.layer.1.EncDecAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.9.layer.1.EncDecAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.9.layer.1.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.9.layer.2.DenseReluDense.wi_0.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.9.layer.2.DenseReluDense.wi_1.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.9.layer.2.DenseReluDense.wo.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.9.layer.2.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.final_layer_norm.weight": "model-00003-of-00003.safetensors",
-    "encoder.block.0.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.0.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.0.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.0.layer.0.SelfAttention.relative_attention_bias.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.0.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.0.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.0.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.0.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.0.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.0.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.1.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.1.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.1.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.1.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.1.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.1.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.1.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.1.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.1.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.10.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.10.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.10.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.10.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.10.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.10.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.10.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.10.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.10.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.11.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.11.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.11.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.11.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.11.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.11.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.11.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.11.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.11.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.12.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.12.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.12.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.12.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.12.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.12.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.12.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.12.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.12.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.13.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.13.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.13.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.13.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.13.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.13.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.13.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.13.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.13.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.14.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.14.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.14.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.14.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.14.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.14.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.14.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.14.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.14.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.15.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.15.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.15.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.15.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.15.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.15.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.15.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.15.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.15.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.16.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.16.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.16.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.16.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.16.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.16.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.16.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.16.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.16.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.17.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.17.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.17.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.17.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.17.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.17.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.17.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.17.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.17.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.18.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.18.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.18.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.18.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.18.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.18.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.18.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.18.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.18.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.19.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.19.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.19.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.19.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.19.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.19.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.19.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.19.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.19.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.2.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.2.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.2.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.2.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.2.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.2.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.2.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.2.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.2.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.20.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.20.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.20.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.20.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.20.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.20.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.20.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.20.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.20.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.21.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.21.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.21.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.21.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.21.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.21.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.21.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.21.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.21.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.22.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.22.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.22.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.22.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.22.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.22.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.22.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.22.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.22.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.23.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.23.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.23.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.23.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.23.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.23.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.23.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.23.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.23.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.3.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.3.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.3.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.3.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.3.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.3.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.3.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.3.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.3.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.4.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.4.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.4.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.4.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.4.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.4.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.4.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.4.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.4.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.5.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.5.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.5.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.5.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.5.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.5.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.5.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.5.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.5.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.6.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.6.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.6.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.6.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.6.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.6.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.6.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.6.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.6.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.7.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.7.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.7.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.7.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.7.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.7.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.7.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.7.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.7.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.8.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.8.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.8.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.8.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.8.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.8.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.8.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.8.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.8.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.9.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.9.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.9.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.9.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.9.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.9.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.9.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.9.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.9.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.final_layer_norm.weight": "model-00001-of-00003.safetensors",
-    "lm_head.weight": "model-00003-of-00003.safetensors",
-    "shared.weight": "model-00001-of-00003.safetensors"
-  }
-}

bigscience_T0_3B_ssml/checkpoint-12/special_tokens_map.json DELETED Viewed

@@ -1,125 +0,0 @@
-{
-  "additional_special_tokens": [
-    "<extra_id_0>",
-    "<extra_id_1>",
-    "<extra_id_2>",
-    "<extra_id_3>",
-    "<extra_id_4>",
-    "<extra_id_5>",
-    "<extra_id_6>",
-    "<extra_id_7>",
-    "<extra_id_8>",
-    "<extra_id_9>",
-    "<extra_id_10>",
-    "<extra_id_11>",
-    "<extra_id_12>",
-    "<extra_id_13>",
-    "<extra_id_14>",
-    "<extra_id_15>",
-    "<extra_id_16>",
-    "<extra_id_17>",
-    "<extra_id_18>",
-    "<extra_id_19>",
-    "<extra_id_20>",
-    "<extra_id_21>",
-    "<extra_id_22>",
-    "<extra_id_23>",
-    "<extra_id_24>",
-    "<extra_id_25>",
-    "<extra_id_26>",
-    "<extra_id_27>",
-    "<extra_id_28>",
-    "<extra_id_29>",
-    "<extra_id_30>",
-    "<extra_id_31>",
-    "<extra_id_32>",
-    "<extra_id_33>",
-    "<extra_id_34>",
-    "<extra_id_35>",
-    "<extra_id_36>",
-    "<extra_id_37>",
-    "<extra_id_38>",
-    "<extra_id_39>",
-    "<extra_id_40>",
-    "<extra_id_41>",
-    "<extra_id_42>",
-    "<extra_id_43>",
-    "<extra_id_44>",
-    "<extra_id_45>",
-    "<extra_id_46>",
-    "<extra_id_47>",
-    "<extra_id_48>",
-    "<extra_id_49>",
-    "<extra_id_50>",
-    "<extra_id_51>",
-    "<extra_id_52>",
-    "<extra_id_53>",
-    "<extra_id_54>",
-    "<extra_id_55>",
-    "<extra_id_56>",
-    "<extra_id_57>",
-    "<extra_id_58>",
-    "<extra_id_59>",
-    "<extra_id_60>",
-    "<extra_id_61>",
-    "<extra_id_62>",
-    "<extra_id_63>",
-    "<extra_id_64>",
-    "<extra_id_65>",
-    "<extra_id_66>",
-    "<extra_id_67>",
-    "<extra_id_68>",
-    "<extra_id_69>",
-    "<extra_id_70>",
-    "<extra_id_71>",
-    "<extra_id_72>",
-    "<extra_id_73>",
-    "<extra_id_74>",
-    "<extra_id_75>",
-    "<extra_id_76>",
-    "<extra_id_77>",
-    "<extra_id_78>",
-    "<extra_id_79>",
-    "<extra_id_80>",
-    "<extra_id_81>",
-    "<extra_id_82>",
-    "<extra_id_83>",
-    "<extra_id_84>",
-    "<extra_id_85>",
-    "<extra_id_86>",
-    "<extra_id_87>",
-    "<extra_id_88>",
-    "<extra_id_89>",
-    "<extra_id_90>",
-    "<extra_id_91>",
-    "<extra_id_92>",
-    "<extra_id_93>",
-    "<extra_id_94>",
-    "<extra_id_95>",
-    "<extra_id_96>",
-    "<extra_id_97>",
-    "<extra_id_98>",
-    "<extra_id_99>"
-  ],
-  "eos_token": {
-    "content": "</s>",
-    "lstrip": false,
-    "normalized": false,
-    "rstrip": false,
-    "single_word": false
-  },
-  "pad_token": {
-    "content": "<pad>",
-    "lstrip": false,
-    "normalized": false,
-    "rstrip": false,
-    "single_word": false
-  },
-  "unk_token": {
-    "content": "<unk>",
-    "lstrip": false,
-    "normalized": false,
-    "rstrip": false,
-    "single_word": false
-  }
-}

bigscience_T0_3B_ssml/checkpoint-12/tokenizer_config.json DELETED Viewed

@@ -1,965 +0,0 @@
-{
-  "add_prefix_space": true,
-  "added_tokens_decoder": {
-    "0": {
-      "content": "<pad>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "1": {
-      "content": "</s>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "2": {
-      "content": "<unk>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "32000": {
-      "content": "<extra_id_99>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32001": {
-      "content": "<extra_id_98>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32002": {
-      "content": "<extra_id_97>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32003": {
-      "content": "<extra_id_96>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32004": {
-      "content": "<extra_id_95>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32005": {
-      "content": "<extra_id_94>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32006": {
-      "content": "<extra_id_93>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32007": {
-      "content": "<extra_id_92>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32008": {
-      "content": "<extra_id_91>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32009": {
-      "content": "<extra_id_90>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32010": {
-      "content": "<extra_id_89>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32011": {
-      "content": "<extra_id_88>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32012": {
-      "content": "<extra_id_87>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32013": {
-      "content": "<extra_id_86>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32014": {
-      "content": "<extra_id_85>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32015": {
-      "content": "<extra_id_84>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32016": {
-      "content": "<extra_id_83>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32017": {
-      "content": "<extra_id_82>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32018": {
-      "content": "<extra_id_81>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32019": {
-      "content": "<extra_id_80>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32020": {
-      "content": "<extra_id_79>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32021": {
-      "content": "<extra_id_78>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32022": {
-      "content": "<extra_id_77>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32023": {
-      "content": "<extra_id_76>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32024": {
-      "content": "<extra_id_75>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32025": {
-      "content": "<extra_id_74>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32026": {
-      "content": "<extra_id_73>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32027": {
-      "content": "<extra_id_72>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32028": {
-      "content": "<extra_id_71>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32029": {
-      "content": "<extra_id_70>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32030": {
-      "content": "<extra_id_69>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32031": {
-      "content": "<extra_id_68>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32032": {
-      "content": "<extra_id_67>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32033": {
-      "content": "<extra_id_66>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32034": {
-      "content": "<extra_id_65>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32035": {
-      "content": "<extra_id_64>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32036": {
-      "content": "<extra_id_63>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32037": {
-      "content": "<extra_id_62>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32038": {
-      "content": "<extra_id_61>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32039": {
-      "content": "<extra_id_60>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32040": {
-      "content": "<extra_id_59>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32041": {
-      "content": "<extra_id_58>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32042": {
-      "content": "<extra_id_57>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32043": {
-      "content": "<extra_id_56>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32044": {
-      "content": "<extra_id_55>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32045": {
-      "content": "<extra_id_54>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32046": {
-      "content": "<extra_id_53>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32047": {
-      "content": "<extra_id_52>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32048": {
-      "content": "<extra_id_51>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32049": {
-      "content": "<extra_id_50>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32050": {
-      "content": "<extra_id_49>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32051": {
-      "content": "<extra_id_48>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32052": {
-      "content": "<extra_id_47>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32053": {
-      "content": "<extra_id_46>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32054": {
-      "content": "<extra_id_45>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32055": {
-      "content": "<extra_id_44>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32056": {
-      "content": "<extra_id_43>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32057": {
-      "content": "<extra_id_42>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32058": {
-      "content": "<extra_id_41>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32059": {
-      "content": "<extra_id_40>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32060": {
-      "content": "<extra_id_39>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32061": {
-      "content": "<extra_id_38>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32062": {
-      "content": "<extra_id_37>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32063": {
-      "content": "<extra_id_36>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32064": {
-      "content": "<extra_id_35>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32065": {
-      "content": "<extra_id_34>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32066": {
-      "content": "<extra_id_33>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32067": {
-      "content": "<extra_id_32>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32068": {
-      "content": "<extra_id_31>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32069": {
-      "content": "<extra_id_30>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32070": {
-      "content": "<extra_id_29>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32071": {
-      "content": "<extra_id_28>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32072": {
-      "content": "<extra_id_27>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32073": {
-      "content": "<extra_id_26>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32074": {
-      "content": "<extra_id_25>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32075": {
-      "content": "<extra_id_24>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32076": {
-      "content": "<extra_id_23>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32077": {
-      "content": "<extra_id_22>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32078": {
-      "content": "<extra_id_21>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32079": {
-      "content": "<extra_id_20>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32080": {
-      "content": "<extra_id_19>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32081": {
-      "content": "<extra_id_18>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32082": {
-      "content": "<extra_id_17>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32083": {
-      "content": "<extra_id_16>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32084": {
-      "content": "<extra_id_15>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32085": {
-      "content": "<extra_id_14>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32086": {
-      "content": "<extra_id_13>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32087": {
-      "content": "<extra_id_12>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32088": {
-      "content": "<extra_id_11>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32089": {
-      "content": "<extra_id_10>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32090": {
-      "content": "<extra_id_9>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32091": {
-      "content": "<extra_id_8>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32092": {
-      "content": "<extra_id_7>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32093": {
-      "content": "<extra_id_6>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32094": {
-      "content": "<extra_id_5>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32095": {
-      "content": "<extra_id_4>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32096": {
-      "content": "<extra_id_3>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32097": {
-      "content": "<extra_id_2>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32098": {
-      "content": "<extra_id_1>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32099": {
-      "content": "<extra_id_0>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32100": {
-      "content": "<prosody>",
-      "lstrip": false,
-      "normalized": true,
-      "rstrip": false,
-      "single_word": false,
-      "special": false
-    },
-    "32101": {
-      "content": "</prosody>",
-      "lstrip": false,
-      "normalized": true,
-      "rstrip": false,
-      "single_word": false,
-      "special": false
-    },
-    "32102": {
-      "content": "<break/>",
-      "lstrip": false,
-      "normalized": true,
-      "rstrip": false,
-      "single_word": false,
-      "special": false
-    }
-  },
-  "additional_special_tokens": [
-    "<extra_id_0>",
-    "<extra_id_1>",
-    "<extra_id_2>",
-    "<extra_id_3>",
-    "<extra_id_4>",
-    "<extra_id_5>",
-    "<extra_id_6>",
-    "<extra_id_7>",
-    "<extra_id_8>",
-    "<extra_id_9>",
-    "<extra_id_10>",
-    "<extra_id_11>",
-    "<extra_id_12>",
-    "<extra_id_13>",
-    "<extra_id_14>",
-    "<extra_id_15>",
-    "<extra_id_16>",
-    "<extra_id_17>",
-    "<extra_id_18>",
-    "<extra_id_19>",
-    "<extra_id_20>",
-    "<extra_id_21>",
-    "<extra_id_22>",
-    "<extra_id_23>",
-    "<extra_id_24>",
-    "<extra_id_25>",
-    "<extra_id_26>",
-    "<extra_id_27>",
-    "<extra_id_28>",
-    "<extra_id_29>",
-    "<extra_id_30>",
-    "<extra_id_31>",
-    "<extra_id_32>",
-    "<extra_id_33>",
-    "<extra_id_34>",
-    "<extra_id_35>",
-    "<extra_id_36>",
-    "<extra_id_37>",
-    "<extra_id_38>",
-    "<extra_id_39>",
-    "<extra_id_40>",
-    "<extra_id_41>",
-    "<extra_id_42>",
-    "<extra_id_43>",
-    "<extra_id_44>",
-    "<extra_id_45>",
-    "<extra_id_46>",
-    "<extra_id_47>",
-    "<extra_id_48>",
-    "<extra_id_49>",
-    "<extra_id_50>",
-    "<extra_id_51>",
-    "<extra_id_52>",
-    "<extra_id_53>",
-    "<extra_id_54>",
-    "<extra_id_55>",
-    "<extra_id_56>",
-    "<extra_id_57>",
-    "<extra_id_58>",
-    "<extra_id_59>",
-    "<extra_id_60>",
-    "<extra_id_61>",
-    "<extra_id_62>",
-    "<extra_id_63>",
-    "<extra_id_64>",
-    "<extra_id_65>",
-    "<extra_id_66>",
-    "<extra_id_67>",
-    "<extra_id_68>",
-    "<extra_id_69>",
-    "<extra_id_70>",
-    "<extra_id_71>",
-    "<extra_id_72>",
-    "<extra_id_73>",
-    "<extra_id_74>",
-    "<extra_id_75>",
-    "<extra_id_76>",
-    "<extra_id_77>",
-    "<extra_id_78>",
-    "<extra_id_79>",
-    "<extra_id_80>",
-    "<extra_id_81>",
-    "<extra_id_82>",
-    "<extra_id_83>",
-    "<extra_id_84>",
-    "<extra_id_85>",
-    "<extra_id_86>",
-    "<extra_id_87>",
-    "<extra_id_88>",
-    "<extra_id_89>",
-    "<extra_id_90>",
-    "<extra_id_91>",
-    "<extra_id_92>",
-    "<extra_id_93>",
-    "<extra_id_94>",
-    "<extra_id_95>",
-    "<extra_id_96>",
-    "<extra_id_97>",
-    "<extra_id_98>",
-    "<extra_id_99>"
-  ],
-  "clean_up_tokenization_spaces": false,
-  "eos_token": "</s>",
-  "extra_ids": 100,
-  "extra_special_tokens": {},
-  "legacy": true,
-  "model_max_length": 512,
-  "pad_token": "<pad>",
-  "sp_model_kwargs": {},
-  "tokenizer_class": "T5Tokenizer",
-  "unk_token": "<unk>"
-}

bigscience_T0_3B_ssml/checkpoint-12/trainer_state.json DELETED Viewed

@@ -1,41 +0,0 @@
-{
-  "best_global_step": null,
-  "best_metric": null,
-  "best_model_checkpoint": null,
-  "epoch": 3.0,
-  "eval_steps": 100,
-  "global_step": 12,
-  "is_hyper_param_search": false,
-  "is_local_process_zero": true,
-  "is_world_process_zero": true,
-  "log_history": [
-    {
-      "epoch": 2.533333333333333,
-      "grad_norm": 2082258.875,
-      "learning_rate": 8.181818181818181e-06,
-      "loss": 10.7988,
-      "step": 10
-    }
-  ],
-  "logging_steps": 10,
-  "max_steps": 12,
-  "num_input_tokens_seen": 0,
-  "num_train_epochs": 3,
-  "save_steps": 500,
-  "stateful_callbacks": {
-    "TrainerControl": {
-      "args": {
-        "should_epoch_stop": false,
-        "should_evaluate": false,
-        "should_log": false,
-        "should_save": true,
-        "should_training_stop": true
-      },
-      "attributes": {}
-    }
-  },
-  "total_flos": 2950494714593280.0,
-  "train_batch_size": 8,
-  "trial_name": null,
-  "trial_params": null
-}

bigscience_T0_3B_ssml/config.json DELETED Viewed

@@ -1,32 +0,0 @@
-{
-  "architectures": [
-    "T5ForConditionalGeneration"
-  ],
-  "classifier_dropout": 0.0,
-  "d_ff": 5120,
-  "d_kv": 64,
-  "d_model": 2048,
-  "decoder_start_token_id": 0,
-  "dense_act_fn": "gelu_new",
-  "dropout_rate": 0.1,
-  "eos_token_id": 1,
-  "feed_forward_proj": "gated-gelu",
-  "gradient_checkpointing": false,
-  "initializer_factor": 1.0,
-  "is_encoder_decoder": true,
-  "is_gated_act": true,
-  "layer_norm_epsilon": 1e-06,
-  "model_type": "t5",
-  "num_decoder_layers": 24,
-  "num_heads": 32,
-  "num_layers": 24,
-  "output_past": true,
-  "pad_token_id": 0,
-  "relative_attention_max_distance": 128,
-  "relative_attention_num_buckets": 32,
-  "tie_word_embeddings": false,
-  "torch_dtype": "float32",
-  "transformers_version": "4.52.2",
-  "use_cache": true,
-  "vocab_size": 32103
-}

bigscience_T0_3B_ssml/generation_config.json DELETED Viewed

@@ -1,7 +0,0 @@
-{
-  "_from_model_config": true,
-  "decoder_start_token_id": 0,
-  "eos_token_id": 1,
-  "pad_token_id": 0,
-  "transformers_version": "4.52.2"
-}

bigscience_T0_3B_ssml/model.safetensors.index.json DELETED Viewed

@@ -1,565 +0,0 @@
-{
-  "metadata": {
-    "total_size": 11398619136
-  },
-  "weight_map": {
-    "decoder.block.0.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
-    "decoder.block.0.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
-    "decoder.block.0.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
-    "decoder.block.0.layer.0.SelfAttention.relative_attention_bias.weight": "model-00001-of-00003.safetensors",
-    "decoder.block.0.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
-    "decoder.block.0.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "decoder.block.0.layer.1.EncDecAttention.k.weight": "model-00001-of-00003.safetensors",
-    "decoder.block.0.layer.1.EncDecAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.0.layer.1.EncDecAttention.q.weight": "model-00001-of-00003.safetensors",
-    "decoder.block.0.layer.1.EncDecAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.0.layer.1.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.0.layer.2.DenseReluDense.wi_0.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.0.layer.2.DenseReluDense.wi_1.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.0.layer.2.DenseReluDense.wo.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.0.layer.2.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.1.layer.0.SelfAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.1.layer.0.SelfAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.1.layer.0.SelfAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.1.layer.0.SelfAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.1.layer.0.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.1.layer.1.EncDecAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.1.layer.1.EncDecAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.1.layer.1.EncDecAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.1.layer.1.EncDecAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.1.layer.1.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.1.layer.2.DenseReluDense.wi_0.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.1.layer.2.DenseReluDense.wi_1.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.1.layer.2.DenseReluDense.wo.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.1.layer.2.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.10.layer.0.SelfAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.10.layer.0.SelfAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.10.layer.0.SelfAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.10.layer.0.SelfAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.10.layer.0.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.10.layer.1.EncDecAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.10.layer.1.EncDecAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.10.layer.1.EncDecAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.10.layer.1.EncDecAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.10.layer.1.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.10.layer.2.DenseReluDense.wi_0.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.10.layer.2.DenseReluDense.wi_1.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.10.layer.2.DenseReluDense.wo.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.10.layer.2.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.11.layer.0.SelfAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.11.layer.0.SelfAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.11.layer.0.SelfAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.11.layer.0.SelfAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.11.layer.0.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.11.layer.1.EncDecAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.11.layer.1.EncDecAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.11.layer.1.EncDecAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.11.layer.1.EncDecAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.11.layer.1.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.11.layer.2.DenseReluDense.wi_0.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.11.layer.2.DenseReluDense.wi_1.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.11.layer.2.DenseReluDense.wo.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.11.layer.2.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.12.layer.0.SelfAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.12.layer.0.SelfAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.12.layer.0.SelfAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.12.layer.0.SelfAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.12.layer.0.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.12.layer.1.EncDecAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.12.layer.1.EncDecAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.12.layer.1.EncDecAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.12.layer.1.EncDecAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.12.layer.1.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.12.layer.2.DenseReluDense.wi_0.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.12.layer.2.DenseReluDense.wi_1.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.12.layer.2.DenseReluDense.wo.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.12.layer.2.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.13.layer.0.SelfAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.13.layer.0.SelfAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.13.layer.0.SelfAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.13.layer.0.SelfAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.13.layer.0.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.13.layer.1.EncDecAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.13.layer.1.EncDecAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.13.layer.1.EncDecAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.13.layer.1.EncDecAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.13.layer.1.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.13.layer.2.DenseReluDense.wi_0.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.13.layer.2.DenseReluDense.wi_1.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.13.layer.2.DenseReluDense.wo.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.13.layer.2.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.14.layer.0.SelfAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.14.layer.0.SelfAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.14.layer.0.SelfAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.14.layer.0.SelfAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.14.layer.0.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.14.layer.1.EncDecAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.14.layer.1.EncDecAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.14.layer.1.EncDecAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.14.layer.1.EncDecAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.14.layer.1.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.14.layer.2.DenseReluDense.wi_0.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.14.layer.2.DenseReluDense.wi_1.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.14.layer.2.DenseReluDense.wo.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.14.layer.2.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.15.layer.0.SelfAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.15.layer.0.SelfAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.15.layer.0.SelfAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.15.layer.0.SelfAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.15.layer.0.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.15.layer.1.EncDecAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.15.layer.1.EncDecAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.15.layer.1.EncDecAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.15.layer.1.EncDecAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.15.layer.1.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.15.layer.2.DenseReluDense.wi_0.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.15.layer.2.DenseReluDense.wi_1.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.15.layer.2.DenseReluDense.wo.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.15.layer.2.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.16.layer.0.SelfAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.16.layer.0.SelfAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.16.layer.0.SelfAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.16.layer.0.SelfAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.16.layer.0.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.16.layer.1.EncDecAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.16.layer.1.EncDecAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.16.layer.1.EncDecAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.16.layer.1.EncDecAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.16.layer.1.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.16.layer.2.DenseReluDense.wi_0.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.16.layer.2.DenseReluDense.wi_1.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.16.layer.2.DenseReluDense.wo.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.16.layer.2.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.17.layer.0.SelfAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.17.layer.0.SelfAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.17.layer.0.SelfAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.17.layer.0.SelfAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.17.layer.0.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.17.layer.1.EncDecAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.17.layer.1.EncDecAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.17.layer.1.EncDecAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.17.layer.1.EncDecAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.17.layer.1.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.17.layer.2.DenseReluDense.wi_0.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.17.layer.2.DenseReluDense.wi_1.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.17.layer.2.DenseReluDense.wo.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.17.layer.2.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.18.layer.0.SelfAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.18.layer.0.SelfAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.18.layer.0.SelfAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.18.layer.0.SelfAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.18.layer.0.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.18.layer.1.EncDecAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.18.layer.1.EncDecAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.18.layer.1.EncDecAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.18.layer.1.EncDecAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.18.layer.1.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.18.layer.2.DenseReluDense.wi_0.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.18.layer.2.DenseReluDense.wi_1.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.18.layer.2.DenseReluDense.wo.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.18.layer.2.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.19.layer.0.SelfAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.19.layer.0.SelfAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.19.layer.0.SelfAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.19.layer.0.SelfAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.19.layer.0.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.19.layer.1.EncDecAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.19.layer.1.EncDecAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.19.layer.1.EncDecAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.19.layer.1.EncDecAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.19.layer.1.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.19.layer.2.DenseReluDense.wi_0.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.19.layer.2.DenseReluDense.wi_1.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.19.layer.2.DenseReluDense.wo.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.19.layer.2.layer_norm.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.2.layer.0.SelfAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.2.layer.0.SelfAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.2.layer.0.SelfAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.2.layer.0.SelfAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.2.layer.0.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.2.layer.1.EncDecAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.2.layer.1.EncDecAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.2.layer.1.EncDecAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.2.layer.1.EncDecAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.2.layer.1.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.2.layer.2.DenseReluDense.wi_0.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.2.layer.2.DenseReluDense.wi_1.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.2.layer.2.DenseReluDense.wo.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.2.layer.2.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.20.layer.0.SelfAttention.k.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.20.layer.0.SelfAttention.o.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.20.layer.0.SelfAttention.q.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.20.layer.0.SelfAttention.v.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.20.layer.0.layer_norm.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.20.layer.1.EncDecAttention.k.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.20.layer.1.EncDecAttention.o.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.20.layer.1.EncDecAttention.q.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.20.layer.1.EncDecAttention.v.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.20.layer.1.layer_norm.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.20.layer.2.DenseReluDense.wi_0.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.20.layer.2.DenseReluDense.wi_1.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.20.layer.2.DenseReluDense.wo.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.20.layer.2.layer_norm.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.21.layer.0.SelfAttention.k.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.21.layer.0.SelfAttention.o.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.21.layer.0.SelfAttention.q.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.21.layer.0.SelfAttention.v.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.21.layer.0.layer_norm.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.21.layer.1.EncDecAttention.k.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.21.layer.1.EncDecAttention.o.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.21.layer.1.EncDecAttention.q.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.21.layer.1.EncDecAttention.v.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.21.layer.1.layer_norm.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.21.layer.2.DenseReluDense.wi_0.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.21.layer.2.DenseReluDense.wi_1.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.21.layer.2.DenseReluDense.wo.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.21.layer.2.layer_norm.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.22.layer.0.SelfAttention.k.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.22.layer.0.SelfAttention.o.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.22.layer.0.SelfAttention.q.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.22.layer.0.SelfAttention.v.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.22.layer.0.layer_norm.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.22.layer.1.EncDecAttention.k.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.22.layer.1.EncDecAttention.o.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.22.layer.1.EncDecAttention.q.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.22.layer.1.EncDecAttention.v.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.22.layer.1.layer_norm.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.22.layer.2.DenseReluDense.wi_0.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.22.layer.2.DenseReluDense.wi_1.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.22.layer.2.DenseReluDense.wo.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.22.layer.2.layer_norm.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.23.layer.0.SelfAttention.k.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.23.layer.0.SelfAttention.o.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.23.layer.0.SelfAttention.q.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.23.layer.0.SelfAttention.v.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.23.layer.0.layer_norm.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.23.layer.1.EncDecAttention.k.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.23.layer.1.EncDecAttention.o.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.23.layer.1.EncDecAttention.q.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.23.layer.1.EncDecAttention.v.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.23.layer.1.layer_norm.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.23.layer.2.DenseReluDense.wi_0.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.23.layer.2.DenseReluDense.wi_1.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.23.layer.2.DenseReluDense.wo.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.23.layer.2.layer_norm.weight": "model-00003-of-00003.safetensors",
-    "decoder.block.3.layer.0.SelfAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.3.layer.0.SelfAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.3.layer.0.SelfAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.3.layer.0.SelfAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.3.layer.0.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.3.layer.1.EncDecAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.3.layer.1.EncDecAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.3.layer.1.EncDecAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.3.layer.1.EncDecAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.3.layer.1.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.3.layer.2.DenseReluDense.wi_0.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.3.layer.2.DenseReluDense.wi_1.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.3.layer.2.DenseReluDense.wo.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.3.layer.2.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.4.layer.0.SelfAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.4.layer.0.SelfAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.4.layer.0.SelfAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.4.layer.0.SelfAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.4.layer.0.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.4.layer.1.EncDecAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.4.layer.1.EncDecAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.4.layer.1.EncDecAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.4.layer.1.EncDecAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.4.layer.1.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.4.layer.2.DenseReluDense.wi_0.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.4.layer.2.DenseReluDense.wi_1.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.4.layer.2.DenseReluDense.wo.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.4.layer.2.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.5.layer.0.SelfAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.5.layer.0.SelfAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.5.layer.0.SelfAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.5.layer.0.SelfAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.5.layer.0.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.5.layer.1.EncDecAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.5.layer.1.EncDecAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.5.layer.1.EncDecAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.5.layer.1.EncDecAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.5.layer.1.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.5.layer.2.DenseReluDense.wi_0.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.5.layer.2.DenseReluDense.wi_1.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.5.layer.2.DenseReluDense.wo.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.5.layer.2.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.6.layer.0.SelfAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.6.layer.0.SelfAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.6.layer.0.SelfAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.6.layer.0.SelfAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.6.layer.0.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.6.layer.1.EncDecAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.6.layer.1.EncDecAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.6.layer.1.EncDecAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.6.layer.1.EncDecAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.6.layer.1.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.6.layer.2.DenseReluDense.wi_0.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.6.layer.2.DenseReluDense.wi_1.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.6.layer.2.DenseReluDense.wo.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.6.layer.2.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.7.layer.0.SelfAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.7.layer.0.SelfAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.7.layer.0.SelfAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.7.layer.0.SelfAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.7.layer.0.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.7.layer.1.EncDecAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.7.layer.1.EncDecAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.7.layer.1.EncDecAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.7.layer.1.EncDecAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.7.layer.1.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.7.layer.2.DenseReluDense.wi_0.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.7.layer.2.DenseReluDense.wi_1.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.7.layer.2.DenseReluDense.wo.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.7.layer.2.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.8.layer.0.SelfAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.8.layer.0.SelfAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.8.layer.0.SelfAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.8.layer.0.SelfAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.8.layer.0.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.8.layer.1.EncDecAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.8.layer.1.EncDecAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.8.layer.1.EncDecAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.8.layer.1.EncDecAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.8.layer.1.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.8.layer.2.DenseReluDense.wi_0.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.8.layer.2.DenseReluDense.wi_1.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.8.layer.2.DenseReluDense.wo.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.8.layer.2.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.9.layer.0.SelfAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.9.layer.0.SelfAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.9.layer.0.SelfAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.9.layer.0.SelfAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.9.layer.0.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.9.layer.1.EncDecAttention.k.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.9.layer.1.EncDecAttention.o.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.9.layer.1.EncDecAttention.q.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.9.layer.1.EncDecAttention.v.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.9.layer.1.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.9.layer.2.DenseReluDense.wi_0.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.9.layer.2.DenseReluDense.wi_1.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.9.layer.2.DenseReluDense.wo.weight": "model-00002-of-00003.safetensors",
-    "decoder.block.9.layer.2.layer_norm.weight": "model-00002-of-00003.safetensors",
-    "decoder.final_layer_norm.weight": "model-00003-of-00003.safetensors",
-    "encoder.block.0.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.0.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.0.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.0.layer.0.SelfAttention.relative_attention_bias.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.0.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.0.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.0.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.0.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.0.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.0.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.1.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.1.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.1.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.1.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.1.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.1.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.1.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.1.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.1.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.10.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.10.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.10.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.10.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.10.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.10.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.10.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.10.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.10.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.11.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.11.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.11.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.11.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.11.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.11.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.11.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.11.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.11.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.12.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.12.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.12.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.12.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.12.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.12.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.12.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.12.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.12.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.13.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.13.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.13.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.13.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.13.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.13.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.13.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.13.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.13.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.14.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.14.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.14.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.14.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.14.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.14.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.14.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.14.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.14.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.15.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.15.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.15.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.15.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.15.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.15.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.15.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.15.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.15.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.16.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.16.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.16.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.16.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.16.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.16.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.16.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.16.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.16.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.17.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.17.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.17.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.17.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.17.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.17.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.17.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.17.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.17.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.18.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.18.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.18.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.18.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.18.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.18.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.18.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.18.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.18.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.19.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.19.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.19.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.19.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.19.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.19.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.19.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.19.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.19.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.2.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.2.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.2.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.2.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.2.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.2.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.2.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.2.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.2.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.20.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.20.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.20.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.20.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.20.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.20.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.20.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.20.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.20.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.21.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.21.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.21.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.21.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.21.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.21.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.21.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.21.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.21.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.22.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.22.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.22.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.22.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.22.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.22.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.22.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.22.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.22.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.23.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.23.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.23.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.23.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.23.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.23.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.23.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.23.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.23.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.3.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.3.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.3.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.3.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.3.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.3.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.3.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.3.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.3.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.4.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.4.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.4.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.4.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.4.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.4.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.4.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.4.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.4.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.5.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.5.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.5.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.5.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.5.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.5.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.5.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.5.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.5.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.6.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.6.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.6.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.6.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.6.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.6.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.6.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.6.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.6.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.7.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.7.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.7.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.7.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.7.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.7.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.7.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.7.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.7.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.8.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.8.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.8.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.8.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.8.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.8.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.8.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.8.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.8.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.9.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.9.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.9.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.9.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.9.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.9.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.9.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.9.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
-    "encoder.block.9.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
-    "encoder.final_layer_norm.weight": "model-00001-of-00003.safetensors",
-    "lm_head.weight": "model-00003-of-00003.safetensors",
-    "shared.weight": "model-00001-of-00003.safetensors"
-  }
-}

bigscience_T0_3B_ssml/special_tokens_map.json DELETED Viewed

@@ -1,125 +0,0 @@
-{
-  "additional_special_tokens": [
-    "<extra_id_0>",
-    "<extra_id_1>",
-    "<extra_id_2>",
-    "<extra_id_3>",
-    "<extra_id_4>",
-    "<extra_id_5>",
-    "<extra_id_6>",
-    "<extra_id_7>",
-    "<extra_id_8>",
-    "<extra_id_9>",
-    "<extra_id_10>",
-    "<extra_id_11>",
-    "<extra_id_12>",
-    "<extra_id_13>",
-    "<extra_id_14>",
-    "<extra_id_15>",
-    "<extra_id_16>",
-    "<extra_id_17>",
-    "<extra_id_18>",
-    "<extra_id_19>",
-    "<extra_id_20>",
-    "<extra_id_21>",
-    "<extra_id_22>",
-    "<extra_id_23>",
-    "<extra_id_24>",
-    "<extra_id_25>",
-    "<extra_id_26>",
-    "<extra_id_27>",
-    "<extra_id_28>",
-    "<extra_id_29>",
-    "<extra_id_30>",
-    "<extra_id_31>",
-    "<extra_id_32>",
-    "<extra_id_33>",
-    "<extra_id_34>",
-    "<extra_id_35>",
-    "<extra_id_36>",
-    "<extra_id_37>",
-    "<extra_id_38>",
-    "<extra_id_39>",
-    "<extra_id_40>",
-    "<extra_id_41>",
-    "<extra_id_42>",
-    "<extra_id_43>",
-    "<extra_id_44>",
-    "<extra_id_45>",
-    "<extra_id_46>",
-    "<extra_id_47>",
-    "<extra_id_48>",
-    "<extra_id_49>",
-    "<extra_id_50>",
-    "<extra_id_51>",
-    "<extra_id_52>",
-    "<extra_id_53>",
-    "<extra_id_54>",
-    "<extra_id_55>",
-    "<extra_id_56>",
-    "<extra_id_57>",
-    "<extra_id_58>",
-    "<extra_id_59>",
-    "<extra_id_60>",
-    "<extra_id_61>",
-    "<extra_id_62>",
-    "<extra_id_63>",
-    "<extra_id_64>",
-    "<extra_id_65>",
-    "<extra_id_66>",
-    "<extra_id_67>",
-    "<extra_id_68>",
-    "<extra_id_69>",
-    "<extra_id_70>",
-    "<extra_id_71>",
-    "<extra_id_72>",
-    "<extra_id_73>",
-    "<extra_id_74>",
-    "<extra_id_75>",
-    "<extra_id_76>",
-    "<extra_id_77>",
-    "<extra_id_78>",
-    "<extra_id_79>",
-    "<extra_id_80>",
-    "<extra_id_81>",
-    "<extra_id_82>",
-    "<extra_id_83>",
-    "<extra_id_84>",
-    "<extra_id_85>",
-    "<extra_id_86>",
-    "<extra_id_87>",
-    "<extra_id_88>",
-    "<extra_id_89>",
-    "<extra_id_90>",
-    "<extra_id_91>",
-    "<extra_id_92>",
-    "<extra_id_93>",
-    "<extra_id_94>",
-    "<extra_id_95>",
-    "<extra_id_96>",
-    "<extra_id_97>",
-    "<extra_id_98>",
-    "<extra_id_99>"
-  ],
-  "eos_token": {
-    "content": "</s>",
-    "lstrip": false,
-    "normalized": false,
-    "rstrip": false,
-    "single_word": false
-  },
-  "pad_token": {
-    "content": "<pad>",
-    "lstrip": false,
-    "normalized": false,
-    "rstrip": false,
-    "single_word": false
-  },
-  "unk_token": {
-    "content": "<unk>",
-    "lstrip": false,
-    "normalized": false,
-    "rstrip": false,
-    "single_word": false
-  }
-}

bigscience_T0_3B_ssml/tokenizer_config.json DELETED Viewed

@@ -1,965 +0,0 @@
-{
-  "add_prefix_space": true,
-  "added_tokens_decoder": {
-    "0": {
-      "content": "<pad>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "1": {
-      "content": "</s>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "2": {
-      "content": "<unk>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "32000": {
-      "content": "<extra_id_99>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32001": {
-      "content": "<extra_id_98>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32002": {
-      "content": "<extra_id_97>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32003": {
-      "content": "<extra_id_96>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32004": {
-      "content": "<extra_id_95>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32005": {
-      "content": "<extra_id_94>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32006": {
-      "content": "<extra_id_93>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32007": {
-      "content": "<extra_id_92>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32008": {
-      "content": "<extra_id_91>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32009": {
-      "content": "<extra_id_90>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32010": {
-      "content": "<extra_id_89>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32011": {
-      "content": "<extra_id_88>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32012": {
-      "content": "<extra_id_87>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32013": {
-      "content": "<extra_id_86>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32014": {
-      "content": "<extra_id_85>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32015": {
-      "content": "<extra_id_84>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32016": {
-      "content": "<extra_id_83>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32017": {
-      "content": "<extra_id_82>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32018": {
-      "content": "<extra_id_81>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32019": {
-      "content": "<extra_id_80>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32020": {
-      "content": "<extra_id_79>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32021": {
-      "content": "<extra_id_78>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32022": {
-      "content": "<extra_id_77>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32023": {
-      "content": "<extra_id_76>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32024": {
-      "content": "<extra_id_75>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32025": {
-      "content": "<extra_id_74>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32026": {
-      "content": "<extra_id_73>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32027": {
-      "content": "<extra_id_72>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32028": {
-      "content": "<extra_id_71>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32029": {
-      "content": "<extra_id_70>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32030": {
-      "content": "<extra_id_69>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32031": {
-      "content": "<extra_id_68>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32032": {
-      "content": "<extra_id_67>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32033": {
-      "content": "<extra_id_66>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32034": {
-      "content": "<extra_id_65>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32035": {
-      "content": "<extra_id_64>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32036": {
-      "content": "<extra_id_63>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32037": {
-      "content": "<extra_id_62>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32038": {
-      "content": "<extra_id_61>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32039": {
-      "content": "<extra_id_60>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32040": {
-      "content": "<extra_id_59>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32041": {
-      "content": "<extra_id_58>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32042": {
-      "content": "<extra_id_57>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32043": {
-      "content": "<extra_id_56>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32044": {
-      "content": "<extra_id_55>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32045": {
-      "content": "<extra_id_54>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32046": {
-      "content": "<extra_id_53>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32047": {
-      "content": "<extra_id_52>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32048": {
-      "content": "<extra_id_51>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32049": {
-      "content": "<extra_id_50>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32050": {
-      "content": "<extra_id_49>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32051": {
-      "content": "<extra_id_48>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32052": {
-      "content": "<extra_id_47>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32053": {
-      "content": "<extra_id_46>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32054": {
-      "content": "<extra_id_45>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32055": {
-      "content": "<extra_id_44>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32056": {
-      "content": "<extra_id_43>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32057": {
-      "content": "<extra_id_42>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32058": {
-      "content": "<extra_id_41>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32059": {
-      "content": "<extra_id_40>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32060": {
-      "content": "<extra_id_39>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32061": {
-      "content": "<extra_id_38>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32062": {
-      "content": "<extra_id_37>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32063": {
-      "content": "<extra_id_36>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32064": {
-      "content": "<extra_id_35>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32065": {
-      "content": "<extra_id_34>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32066": {
-      "content": "<extra_id_33>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32067": {
-      "content": "<extra_id_32>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32068": {
-      "content": "<extra_id_31>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32069": {
-      "content": "<extra_id_30>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32070": {
-      "content": "<extra_id_29>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32071": {
-      "content": "<extra_id_28>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32072": {
-      "content": "<extra_id_27>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32073": {
-      "content": "<extra_id_26>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32074": {
-      "content": "<extra_id_25>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32075": {
-      "content": "<extra_id_24>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32076": {
-      "content": "<extra_id_23>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32077": {
-      "content": "<extra_id_22>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32078": {
-      "content": "<extra_id_21>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32079": {
-      "content": "<extra_id_20>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32080": {
-      "content": "<extra_id_19>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32081": {
-      "content": "<extra_id_18>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32082": {
-      "content": "<extra_id_17>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32083": {
-      "content": "<extra_id_16>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32084": {
-      "content": "<extra_id_15>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32085": {
-      "content": "<extra_id_14>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32086": {
-      "content": "<extra_id_13>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32087": {
-      "content": "<extra_id_12>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32088": {
-      "content": "<extra_id_11>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32089": {
-      "content": "<extra_id_10>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32090": {
-      "content": "<extra_id_9>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32091": {
-      "content": "<extra_id_8>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32092": {
-      "content": "<extra_id_7>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32093": {
-      "content": "<extra_id_6>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32094": {
-      "content": "<extra_id_5>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32095": {
-      "content": "<extra_id_4>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32096": {
-      "content": "<extra_id_3>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32097": {
-      "content": "<extra_id_2>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32098": {
-      "content": "<extra_id_1>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32099": {
-      "content": "<extra_id_0>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": true,
-      "single_word": false,
-      "special": true
-    },
-    "32100": {
-      "content": "<prosody>",
-      "lstrip": false,
-      "normalized": true,
-      "rstrip": false,
-      "single_word": false,
-      "special": false
-    },
-    "32101": {
-      "content": "</prosody>",
-      "lstrip": false,
-      "normalized": true,
-      "rstrip": false,
-      "single_word": false,
-      "special": false
-    },
-    "32102": {
-      "content": "<break/>",
-      "lstrip": false,
-      "normalized": true,
-      "rstrip": false,
-      "single_word": false,
-      "special": false
-    }
-  },
-  "additional_special_tokens": [
-    "<extra_id_0>",
-    "<extra_id_1>",
-    "<extra_id_2>",
-    "<extra_id_3>",
-    "<extra_id_4>",
-    "<extra_id_5>",
-    "<extra_id_6>",
-    "<extra_id_7>",
-    "<extra_id_8>",
-    "<extra_id_9>",
-    "<extra_id_10>",
-    "<extra_id_11>",
-    "<extra_id_12>",
-    "<extra_id_13>",
-    "<extra_id_14>",
-    "<extra_id_15>",
-    "<extra_id_16>",
-    "<extra_id_17>",
-    "<extra_id_18>",
-    "<extra_id_19>",
-    "<extra_id_20>",
-    "<extra_id_21>",
-    "<extra_id_22>",
-    "<extra_id_23>",
-    "<extra_id_24>",
-    "<extra_id_25>",
-    "<extra_id_26>",
-    "<extra_id_27>",
-    "<extra_id_28>",
-    "<extra_id_29>",
-    "<extra_id_30>",
-    "<extra_id_31>",
-    "<extra_id_32>",
-    "<extra_id_33>",
-    "<extra_id_34>",
-    "<extra_id_35>",
-    "<extra_id_36>",
-    "<extra_id_37>",
-    "<extra_id_38>",
-    "<extra_id_39>",
-    "<extra_id_40>",
-    "<extra_id_41>",
-    "<extra_id_42>",
-    "<extra_id_43>",
-    "<extra_id_44>",
-    "<extra_id_45>",
-    "<extra_id_46>",
-    "<extra_id_47>",
-    "<extra_id_48>",
-    "<extra_id_49>",
-    "<extra_id_50>",
-    "<extra_id_51>",
-    "<extra_id_52>",
-    "<extra_id_53>",
-    "<extra_id_54>",
-    "<extra_id_55>",
-    "<extra_id_56>",
-    "<extra_id_57>",
-    "<extra_id_58>",
-    "<extra_id_59>",
-    "<extra_id_60>",
-    "<extra_id_61>",
-    "<extra_id_62>",
-    "<extra_id_63>",
-    "<extra_id_64>",
-    "<extra_id_65>",
-    "<extra_id_66>",
-    "<extra_id_67>",
-    "<extra_id_68>",
-    "<extra_id_69>",
-    "<extra_id_70>",
-    "<extra_id_71>",
-    "<extra_id_72>",
-    "<extra_id_73>",
-    "<extra_id_74>",
-    "<extra_id_75>",
-    "<extra_id_76>",
-    "<extra_id_77>",
-    "<extra_id_78>",
-    "<extra_id_79>",
-    "<extra_id_80>",
-    "<extra_id_81>",
-    "<extra_id_82>",
-    "<extra_id_83>",
-    "<extra_id_84>",
-    "<extra_id_85>",
-    "<extra_id_86>",
-    "<extra_id_87>",
-    "<extra_id_88>",
-    "<extra_id_89>",
-    "<extra_id_90>",
-    "<extra_id_91>",
-    "<extra_id_92>",
-    "<extra_id_93>",
-    "<extra_id_94>",
-    "<extra_id_95>",
-    "<extra_id_96>",
-    "<extra_id_97>",
-    "<extra_id_98>",
-    "<extra_id_99>"
-  ],
-  "clean_up_tokenization_spaces": false,
-  "eos_token": "</s>",
-  "extra_ids": 100,
-  "extra_special_tokens": {},
-  "legacy": true,
-  "model_max_length": 512,
-  "pad_token": "<pad>",
-  "sp_model_kwargs": {},
-  "tokenizer_class": "T5Tokenizer",
-  "unk_token": "<unk>"
-}

chat_template.jinja DELETED Viewed

@@ -1,54 +0,0 @@
-{%- if tools %}
-    {{- '<|im_start|>system\n' }}
-    {%- if messages[0]['role'] == 'system' %}
-        {{- messages[0]['content'] }}
-    {%- else %}
-        {{- 'You are a helpful assistant.' }}
-    {%- endif %}
-    {{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
-    {%- for tool in tools %}
-        {{- "\n" }}
-        {{- tool | tojson }}
-    {%- endfor %}
-    {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
-{%- else %}
-    {%- if messages[0]['role'] == 'system' %}
-        {{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }}
-    {%- else %}
-        {{- '<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n' }}
-    {%- endif %}
-{%- endif %}
-{%- for message in messages %}
-    {%- if (message.role == "user") or (message.role == "system" and not loop.first) or (message.role == "assistant" and not message.tool_calls) %}
-        {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
-    {%- elif message.role == "assistant" %}
-        {{- '<|im_start|>' + message.role }}
-        {%- if message.content %}
-            {{- '\n' + message.content }}
-        {%- endif %}
-        {%- for tool_call in message.tool_calls %}
-            {%- if tool_call.function is defined %}
-                {%- set tool_call = tool_call.function %}
-            {%- endif %}
-            {{- '\n<tool_call>\n{"name": "' }}
-            {{- tool_call.name }}
-            {{- '", "arguments": ' }}
-            {{- tool_call.arguments | tojson }}
-            {{- '}\n</tool_call>' }}
-        {%- endfor %}
-        {{- '<|im_end|>\n' }}
-    {%- elif message.role == "tool" %}
-        {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != "tool") %}
-            {{- '<|im_start|>user' }}
-        {%- endif %}
-        {{- '\n<tool_response>\n' }}
-        {{- message.content }}
-        {{- '\n</tool_response>' }}
-        {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
-            {{- '<|im_end|>\n' }}
-        {%- endif %}
-    {%- endif %}
-{%- endfor %}
-{%- if add_generation_prompt %}
-    {{- '<|im_start|>assistant\n' }}
-{%- endif %}

checkpoint-735/README.md DELETED Viewed

@@ -1,202 +0,0 @@
----
-base_model: Qwen/Qwen2.5-7B
-library_name: peft
----
-# Model Card for Model ID
-<!-- Provide a quick summary of what the model is/does. -->
-## Model Details
-### Model Description
-<!-- Provide a longer summary of what this model is. -->
-- **Developed by:** [More Information Needed]
-- **Funded by [optional]:** [More Information Needed]
-- **Shared by [optional]:** [More Information Needed]
-- **Model type:** [More Information Needed]
-- **Language(s) (NLP):** [More Information Needed]
-- **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [More Information Needed]
-### Model Sources [optional]
-<!-- Provide the basic links for the model. -->
-- **Repository:** [More Information Needed]
-- **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
-## Uses
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
-### Direct Use
-<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
-[More Information Needed]
-### Downstream Use [optional]
-<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
-[More Information Needed]
-### Out-of-Scope Use
-<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
-[More Information Needed]
-## Bias, Risks, and Limitations
-<!-- This section is meant to convey both technical and sociotechnical limitations. -->
-[More Information Needed]
-### Recommendations
-<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
-Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
-## How to Get Started with the Model
-Use the code below to get started with the model.
-[More Information Needed]
-## Training Details
-### Training Data
-<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
-### Training Procedure
-<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-#### Preprocessing [optional]
-[More Information Needed]
-#### Training Hyperparameters
-- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
-#### Speeds, Sizes, Times [optional]
-<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
-[More Information Needed]
-## Evaluation
-<!-- This section describes the evaluation protocols and provides the results. -->
-### Testing Data, Factors & Metrics
-#### Testing Data
-<!-- This should link to a Dataset Card if possible. -->
-[More Information Needed]
-#### Factors
-<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
-[More Information Needed]
-#### Metrics
-<!-- These are the evaluation metrics being used, ideally with a description of why. -->
-[More Information Needed]
-### Results
-[More Information Needed]
-#### Summary
-## Model Examination [optional]
-<!-- Relevant interpretability work for the model goes here -->
-[More Information Needed]
-## Environmental Impact
-<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
-Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
-- **Hardware Type:** [More Information Needed]
-- **Hours used:** [More Information Needed]
-- **Cloud Provider:** [More Information Needed]
-- **Compute Region:** [More Information Needed]
-- **Carbon Emitted:** [More Information Needed]
-## Technical Specifications [optional]
-### Model Architecture and Objective
-[More Information Needed]
-### Compute Infrastructure
-[More Information Needed]
-#### Hardware
-[More Information Needed]
-#### Software
-[More Information Needed]
-## Citation [optional]
-<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
-**BibTeX:**
-[More Information Needed]
-**APA:**
-[More Information Needed]
-## Glossary [optional]
-<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
-[More Information Needed]
-## More Information [optional]
-[More Information Needed]
-## Model Card Authors [optional]
-[More Information Needed]
-## Model Card Contact
-[More Information Needed]
-### Framework versions
-- PEFT 0.15.2

checkpoint-735/adapter_config.json DELETED Viewed

@@ -1,39 +0,0 @@
-{
-  "alpha_pattern": {},
-  "auto_mapping": null,
-  "base_model_name_or_path": "Qwen/Qwen2.5-7B",
-  "bias": "none",
-  "corda_config": null,
-  "eva_config": null,
-  "exclude_modules": null,
-  "fan_in_fan_out": false,
-  "inference_mode": true,
-  "init_lora_weights": true,
-  "layer_replication": null,
-  "layers_pattern": null,
-  "layers_to_transform": null,
-  "loftq_config": {},
-  "lora_alpha": 16,
-  "lora_bias": false,
-  "lora_dropout": 0.1,
-  "megatron_config": null,
-  "megatron_core": "megatron.core",
-  "modules_to_save": null,
-  "peft_type": "LORA",
-  "r": 8,
-  "rank_pattern": {},
-  "revision": null,
-  "target_modules": [
-    "v_proj",
-    "o_proj",
-    "down_proj",
-    "q_proj",
-    "gate_proj",
-    "k_proj",
-    "up_proj"
-  ],
-  "task_type": "CAUSAL_LM",
-  "trainable_token_indices": null,
-  "use_dora": false,
-  "use_rslora": false
-}

checkpoint-735/added_tokens.json DELETED Viewed

@@ -1,24 +0,0 @@
-{
-  "</tool_call>": 151658,
-  "<tool_call>": 151657,
-  "<|box_end|>": 151649,
-  "<|box_start|>": 151648,
-  "<|endoftext|>": 151643,
-  "<|file_sep|>": 151664,
-  "<|fim_middle|>": 151660,
-  "<|fim_pad|>": 151662,
-  "<|fim_prefix|>": 151659,
-  "<|fim_suffix|>": 151661,
-  "<|im_end|>": 151645,
-  "<|im_start|>": 151644,
-  "<|image_pad|>": 151655,
-  "<|object_ref_end|>": 151647,
-  "<|object_ref_start|>": 151646,
-  "<|quad_end|>": 151651,
-  "<|quad_start|>": 151650,
-  "<|repo_name|>": 151663,
-  "<|video_pad|>": 151656,
-  "<|vision_end|>": 151653,
-  "<|vision_pad|>": 151654,
-  "<|vision_start|>": 151652
-}

checkpoint-735/chat_template.jinja DELETED Viewed

@@ -1,54 +0,0 @@
-{%- if tools %}
-    {{- '<|im_start|>system\n' }}
-    {%- if messages[0]['role'] == 'system' %}
-        {{- messages[0]['content'] }}
-    {%- else %}
-        {{- 'You are a helpful assistant.' }}
-    {%- endif %}
-    {{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
-    {%- for tool in tools %}
-        {{- "\n" }}
-        {{- tool | tojson }}
-    {%- endfor %}
-    {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
-{%- else %}
-    {%- if messages[0]['role'] == 'system' %}
-        {{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }}
-    {%- else %}
-        {{- '<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n' }}
-    {%- endif %}
-{%- endif %}
-{%- for message in messages %}
-    {%- if (message.role == "user") or (message.role == "system" and not loop.first) or (message.role == "assistant" and not message.tool_calls) %}
-        {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
-    {%- elif message.role == "assistant" %}
-        {{- '<|im_start|>' + message.role }}
-        {%- if message.content %}
-            {{- '\n' + message.content }}
-        {%- endif %}
-        {%- for tool_call in message.tool_calls %}
-            {%- if tool_call.function is defined %}
-                {%- set tool_call = tool_call.function %}
-            {%- endif %}
-            {{- '\n<tool_call>\n{"name": "' }}
-            {{- tool_call.name }}
-            {{- '", "arguments": ' }}
-            {{- tool_call.arguments | tojson }}
-            {{- '}\n</tool_call>' }}
-        {%- endfor %}
-        {{- '<|im_end|>\n' }}
-    {%- elif message.role == "tool" %}
-        {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != "tool") %}
-            {{- '<|im_start|>user' }}
-        {%- endif %}
-        {{- '\n<tool_response>\n' }}
-        {{- message.content }}
-        {{- '\n</tool_response>' }}
-        {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
-            {{- '<|im_end|>\n' }}
-        {%- endif %}
-    {%- endif %}
-{%- endfor %}
-{%- if add_generation_prompt %}
-    {{- '<|im_start|>assistant\n' }}
-{%- endif %}

checkpoint-735/merges.txt DELETED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-735/special_tokens_map.json DELETED Viewed

@@ -1,31 +0,0 @@
-{
-  "additional_special_tokens": [
-    "<|im_start|>",
-    "<|im_end|>",
-    "<|object_ref_start|>",
-    "<|object_ref_end|>",
-    "<|box_start|>",
-    "<|box_end|>",
-    "<|quad_start|>",
-    "<|quad_end|>",
-    "<|vision_start|>",
-    "<|vision_end|>",
-    "<|vision_pad|>",
-    "<|image_pad|>",
-    "<|video_pad|>"
-  ],
-  "eos_token": {
-    "content": "<|endoftext|>",
-    "lstrip": false,
-    "normalized": false,
-    "rstrip": false,
-    "single_word": false
-  },
-  "pad_token": {
-    "content": "<|endoftext|>",
-    "lstrip": false,
-    "normalized": false,
-    "rstrip": false,
-    "single_word": false
-  }
-}

checkpoint-735/tokenizer_config.json DELETED Viewed

@@ -1,207 +0,0 @@
-{
-  "add_bos_token": false,
-  "add_prefix_space": false,
-  "added_tokens_decoder": {
-    "151643": {
-      "content": "<|endoftext|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151644": {
-      "content": "<|im_start|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151645": {
-      "content": "<|im_end|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151646": {
-      "content": "<|object_ref_start|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151647": {
-      "content": "<|object_ref_end|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151648": {
-      "content": "<|box_start|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151649": {
-      "content": "<|box_end|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151650": {
-      "content": "<|quad_start|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151651": {
-      "content": "<|quad_end|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151652": {
-      "content": "<|vision_start|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151653": {
-      "content": "<|vision_end|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151654": {
-      "content": "<|vision_pad|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151655": {
-      "content": "<|image_pad|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151656": {
-      "content": "<|video_pad|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151657": {
-      "content": "<tool_call>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": false
-    },
-    "151658": {
-      "content": "</tool_call>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": false
-    },
-    "151659": {
-      "content": "<|fim_prefix|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": false
-    },
-    "151660": {
-      "content": "<|fim_middle|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": false
-    },
-    "151661": {
-      "content": "<|fim_suffix|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": false
-    },
-    "151662": {
-      "content": "<|fim_pad|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": false
-    },
-    "151663": {
-      "content": "<|repo_name|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": false
-    },
-    "151664": {
-      "content": "<|file_sep|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": false
-    }
-  },
-  "additional_special_tokens": [
-    "<|im_start|>",
-    "<|im_end|>",
-    "<|object_ref_start|>",
-    "<|object_ref_end|>",
-    "<|box_start|>",
-    "<|box_end|>",
-    "<|quad_start|>",
-    "<|quad_end|>",
-    "<|vision_start|>",
-    "<|vision_end|>",
-    "<|vision_pad|>",
-    "<|image_pad|>",
-    "<|video_pad|>"
-  ],
-  "bos_token": null,
-  "clean_up_tokenization_spaces": false,
-  "eos_token": "<|endoftext|>",
-  "errors": "replace",
-  "extra_special_tokens": {},
-  "model_max_length": 131072,
-  "pad_token": "<|endoftext|>",
-  "split_special_tokens": false,
-  "tokenizer_class": "Qwen2Tokenizer",
-  "unk_token": null
-}

checkpoint-735/trainer_state.json DELETED Viewed

@@ -1,286 +0,0 @@
-{
-  "best_global_step": null,
-  "best_metric": null,
-  "best_model_checkpoint": null,
-  "epoch": 5.0,
-  "eval_steps": 10000,
-  "global_step": 735,
-  "is_hyper_param_search": false,
-  "is_local_process_zero": true,
-  "is_world_process_zero": true,
-  "log_history": [
-    {
-      "epoch": 0.13646055437100213,
-      "grad_norm": 0.14500385522842407,
-      "learning_rate": 0.00011399999999999999,
-      "loss": 0.1268,
-      "step": 20
-    },
-    {
-      "epoch": 0.27292110874200426,
-      "grad_norm": 0.06265528500080109,
-      "learning_rate": 0.000234,
-      "loss": 0.0282,
-      "step": 40
-    },
-    {
-      "epoch": 0.4093816631130064,
-      "grad_norm": 0.03309512510895729,
-      "learning_rate": 0.00029987223755234907,
-      "loss": 0.0157,
-      "step": 60
-    },
-    {
-      "epoch": 0.5458422174840085,
-      "grad_norm": 0.03388677537441254,
-      "learning_rate": 0.00029867524500941253,
-      "loss": 0.0125,
-      "step": 80
-    },
-    {
-      "epoch": 0.6823027718550106,
-      "grad_norm": 0.038252197206020355,
-      "learning_rate": 0.00029622824461983995,
-      "loss": 0.0128,
-      "step": 100
-    },
-    {
-      "epoch": 0.8187633262260128,
-      "grad_norm": 0.06330293416976929,
-      "learning_rate": 0.00029255180988050044,
-      "loss": 0.0112,
-      "step": 120
-    },
-    {
-      "epoch": 0.9552238805970149,
-      "grad_norm": 0.03660163655877113,
-      "learning_rate": 0.0002876768509289324,
-      "loss": 0.0106,
-      "step": 140
-    },
-    {
-      "epoch": 1.0886993603411514,
-      "grad_norm": 0.028105631470680237,
-      "learning_rate": 0.0002816443546620542,
-      "loss": 0.0085,
-      "step": 160
-    },
-    {
-      "epoch": 1.2251599147121535,
-      "grad_norm": 0.04313601925969124,
-      "learning_rate": 0.00027450504013311436,
-      "loss": 0.007,
-      "step": 180
-    },
-    {
-      "epoch": 1.3616204690831557,
-      "grad_norm": 0.03731823340058327,
-      "learning_rate": 0.00026631893212418224,
-      "loss": 0.0056,
-      "step": 200
-    },
-    {
-      "epoch": 1.4980810234541577,
-      "grad_norm": 0.02299409918487072,
-      "learning_rate": 0.00025715485647942525,
-      "loss": 0.0075,
-      "step": 220
-    },
-    {
-      "epoch": 1.63454157782516,
-      "grad_norm": 0.03759332001209259,
-      "learning_rate": 0.00024708986144223035,
-      "loss": 0.0063,
-      "step": 240
-    },
-    {
-      "epoch": 1.7710021321961622,
-      "grad_norm": 0.013533813878893852,
-      "learning_rate": 0.00023620856986135804,
-      "loss": 0.0052,
-      "step": 260
-    },
-    {
-      "epoch": 1.9074626865671642,
-      "grad_norm": 0.0647633746266365,
-      "learning_rate": 0.00022460246771254522,
-      "loss": 0.0045,
-      "step": 280
-    },
-    {
-      "epoch": 2.0409381663113004,
-      "grad_norm": 0.022289317101240158,
-      "learning_rate": 0.0002123691349174121,
-      "loss": 0.003,
-      "step": 300
-    },
-    {
-      "epoch": 2.177398720682303,
-      "grad_norm": 0.040852464735507965,
-      "learning_rate": 0.00019961142492666903,
-      "loss": 0.0028,
-      "step": 320
-    },
-    {
-      "epoch": 2.313859275053305,
-      "grad_norm": 0.04412081092596054,
-      "learning_rate": 0.00018643659996539272,
-      "loss": 0.0029,
-      "step": 340
-    },
-    {
-      "epoch": 2.450319829424307,
-      "grad_norm": 0.047590937465429306,
-      "learning_rate": 0.00017295542921091727,
-      "loss": 0.0025,
-      "step": 360
-    },
-    {
-      "epoch": 2.5867803837953094,
-      "grad_norm": 0.032162390649318695,
-      "learning_rate": 0.00015928125748553563,
-      "loss": 0.002,
-      "step": 380
-    },
-    {
-      "epoch": 2.7232409381663114,
-      "grad_norm": 0.02574550174176693,
-      "learning_rate": 0.00014552905229410626,
-      "loss": 0.0014,
-      "step": 400
-    },
-    {
-      "epoch": 2.8597014925373134,
-      "grad_norm": 0.014707539230585098,
-      "learning_rate": 0.000131814437218731,
-      "loss": 0.0012,
-      "step": 420
-    },
-    {
-      "epoch": 2.9961620469083154,
-      "grad_norm": 0.04477572441101074,
-      "learning_rate": 0.0001182527197973709,
-      "loss": 0.0011,
-      "step": 440
-    },
-    {
-      "epoch": 3.129637526652452,
-      "grad_norm": 0.012851215898990631,
-      "learning_rate": 0.00010495792205964832,
-      "loss": 0.0008,
-      "step": 460
-    },
-    {
-      "epoch": 3.266098081023454,
-      "grad_norm": 0.02783357724547386,
-      "learning_rate": 9.204182187073868e-05,
-      "loss": 0.0007,
-      "step": 480
-    },
-    {
-      "epoch": 3.402558635394456,
-      "grad_norm": 0.004735818598419428,
-      "learning_rate": 7.961301314338808e-05,
-      "loss": 0.0004,
-      "step": 500
-    },
-    {
-      "epoch": 3.539019189765458,
-      "grad_norm": 0.003670594422146678,
-      "learning_rate": 6.777599281945507e-05,
-      "loss": 0.0004,
-      "step": 520
-    },
-    {
-      "epoch": 3.6754797441364606,
-      "grad_norm": 0.013839378952980042,
-      "learning_rate": 5.66302822973053e-05,
-      "loss": 0.0004,
-      "step": 540
-    },
-    {
-      "epoch": 3.8119402985074626,
-      "grad_norm": 0.00302703189663589,
-      "learning_rate": 4.626959069178253e-05,
-      "loss": 0.0004,
-      "step": 560
-    },
-    {
-      "epoch": 3.948400852878465,
-      "grad_norm": 0.026532089337706566,
-      "learning_rate": 3.6781026961763353e-05,
-      "loss": 0.0004,
-      "step": 580
-    },
-    {
-      "epoch": 4.081876332622601,
-      "grad_norm": 0.001038851565681398,
-      "learning_rate": 2.8244367529442822e-05,
-      "loss": 0.0002,
-      "step": 600
-    },
-    {
-      "epoch": 4.218336886993604,
-      "grad_norm": 0.0008404534310102463,
-      "learning_rate": 2.0731385548944725e-05,
-      "loss": 0.0002,
-      "step": 620
-    },
-    {
-      "epoch": 4.354797441364606,
-      "grad_norm": 0.0016892015701159835,
-      "learning_rate": 1.4305247463523778e-05,
-      "loss": 0.0002,
-      "step": 640
-    },
-    {
-      "epoch": 4.491257995735608,
-      "grad_norm": 0.002141030738130212,
-      "learning_rate": 9.019981924888797e-06,
-      "loss": 0.0001,
-      "step": 660
-    },
-    {
-      "epoch": 4.62771855010661,
-      "grad_norm": 0.0017192725790664554,
-      "learning_rate": 4.920025539782397e-06,
-      "loss": 0.0002,
-      "step": 680
-    },
-    {
-      "epoch": 4.764179104477612,
-      "grad_norm": 0.002778939437121153,
-      "learning_rate": 2.0398492630157303e-06,
-      "loss": 0.0002,
-      "step": 700
-    },
-    {
-      "epoch": 4.900639658848614,
-      "grad_norm": 0.002974987495690584,
-      "learning_rate": 4.036685781107329e-07,
-      "loss": 0.0001,
-      "step": 720
-    }
-  ],
-  "logging_steps": 20,
-  "max_steps": 735,
-  "num_input_tokens_seen": 0,
-  "num_train_epochs": 5,
-  "save_steps": 10000,
-  "stateful_callbacks": {
-    "TrainerControl": {
-      "args": {
-        "should_epoch_stop": false,
-        "should_evaluate": false,
-        "should_log": false,
-        "should_save": true,
-        "should_training_stop": true
-      },
-      "attributes": {}
-    }
-  },
-  "total_flos": 1.8319121075920896e+17,
-  "train_batch_size": 1,
-  "trial_name": null,
-  "trial_params": null
-}

checkpoint-735/vocab.json DELETED Viewed

The diff for this file is too large to render. See raw diff

facebook_bart-base_ssml/added_tokens.json DELETED Viewed

@@ -1,6 +0,0 @@
-{
-  "</prosody>": 50266,
-  "<break/>": 50267,
-  "<break>": 50268,
-  "<prosody>": 50265
-}

facebook_bart-base_ssml/checkpoint-120/added_tokens.json DELETED Viewed

@@ -1,6 +0,0 @@
-{
-  "</prosody>": 50266,
-  "<break/>": 50267,
-  "<break>": 50268,
-  "<prosody>": 50265
-}

facebook_bart-base_ssml/checkpoint-120/config.json DELETED Viewed

@@ -1,73 +0,0 @@
-{
-  "activation_dropout": 0.1,
-  "activation_function": "gelu",
-  "add_bias_logits": false,
-  "add_final_layer_norm": false,
-  "architectures": [
-    "BartForConditionalGeneration"
-  ],
-  "attention_dropout": 0.1,
-  "bos_token_id": 0,
-  "classif_dropout": 0.1,
-  "classifier_dropout": 0.0,
-  "d_model": 768,
-  "decoder_attention_heads": 12,
-  "decoder_ffn_dim": 3072,
-  "decoder_layerdrop": 0.0,
-  "decoder_layers": 6,
-  "decoder_start_token_id": 2,
-  "dropout": 0.1,
-  "early_stopping": null,
-  "encoder_attention_heads": 12,
-  "encoder_ffn_dim": 3072,
-  "encoder_layerdrop": 0.0,
-  "encoder_layers": 6,
-  "eos_token_id": 2,
-  "forced_eos_token_id": 2,
-  "gradient_checkpointing": false,
-  "id2label": {
-    "0": "LABEL_0",
-    "1": "LABEL_1",
-    "2": "LABEL_2"
-  },
-  "init_std": 0.02,
-  "is_encoder_decoder": true,
-  "label2id": {
-    "LABEL_0": 0,
-    "LABEL_1": 1,
-    "LABEL_2": 2
-  },
-  "max_position_embeddings": 1024,
-  "model_type": "bart",
-  "no_repeat_ngram_size": null,
-  "normalize_before": false,
-  "normalize_embedding": true,
-  "num_beams": null,
-  "num_hidden_layers": 6,
-  "pad_token_id": 1,
-  "scale_embedding": false,
-  "task_specific_params": {
-    "summarization": {
-      "length_penalty": 1.0,
-      "max_length": 128,
-      "min_length": 12,
-      "num_beams": 4
-    },
-    "summarization_cnn": {
-      "length_penalty": 2.0,
-      "max_length": 142,
-      "min_length": 56,
-      "num_beams": 4
-    },
-    "summarization_xsum": {
-      "length_penalty": 1.0,
-      "max_length": 62,
-      "min_length": 11,
-      "num_beams": 6
-    }
-  },
-  "torch_dtype": "float32",
-  "transformers_version": "4.52.2",
-  "use_cache": true,
-  "vocab_size": 50269
-}

facebook_bart-base_ssml/checkpoint-120/generation_config.json DELETED Viewed

@@ -1,13 +0,0 @@
-{
-  "_from_model_config": true,
-  "bos_token_id": 0,
-  "decoder_start_token_id": 2,
-  "early_stopping": true,
-  "eos_token_id": 2,
-  "forced_bos_token_id": 0,
-  "forced_eos_token_id": 2,
-  "no_repeat_ngram_size": 3,
-  "num_beams": 4,
-  "pad_token_id": 1,
-  "transformers_version": "4.52.2"
-}

facebook_bart-base_ssml/checkpoint-120/merges.txt DELETED Viewed

The diff for this file is too large to render. See raw diff

facebook_bart-base_ssml/checkpoint-120/special_tokens_map.json DELETED Viewed

@@ -1,51 +0,0 @@
-{
-  "bos_token": {
-    "content": "<s>",
-    "lstrip": false,
-    "normalized": true,
-    "rstrip": false,
-    "single_word": false
-  },
-  "cls_token": {
-    "content": "<s>",
-    "lstrip": false,
-    "normalized": true,
-    "rstrip": false,
-    "single_word": false
-  },
-  "eos_token": {
-    "content": "</s>",
-    "lstrip": false,
-    "normalized": true,
-    "rstrip": false,
-    "single_word": false
-  },
-  "mask_token": {
-    "content": "<mask>",
-    "lstrip": true,
-    "normalized": true,
-    "rstrip": false,
-    "single_word": false
-  },
-  "pad_token": {
-    "content": "<pad>",
-    "lstrip": false,
-    "normalized": true,
-    "rstrip": false,
-    "single_word": false
-  },
-  "sep_token": {
-    "content": "</s>",
-    "lstrip": false,
-    "normalized": true,
-    "rstrip": false,
-    "single_word": false
-  },
-  "unk_token": {
-    "content": "<unk>",
-    "lstrip": false,
-    "normalized": true,
-    "rstrip": false,
-    "single_word": false
-  }
-}

facebook_bart-base_ssml/checkpoint-120/tokenizer_config.json DELETED Viewed

@@ -1,89 +0,0 @@
-{
-  "add_prefix_space": false,
-  "added_tokens_decoder": {
-    "0": {
-      "content": "<s>",
-      "lstrip": false,
-      "normalized": true,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "1": {
-      "content": "<pad>",
-      "lstrip": false,
-      "normalized": true,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "2": {
-      "content": "</s>",
-      "lstrip": false,
-      "normalized": true,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "3": {
-      "content": "<unk>",
-      "lstrip": false,
-      "normalized": true,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "50264": {
-      "content": "<mask>",
-      "lstrip": true,
-      "normalized": true,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "50265": {
-      "content": "<prosody>",
-      "lstrip": false,
-      "normalized": true,
-      "rstrip": false,
-      "single_word": false,
-      "special": false
-    },
-    "50266": {
-      "content": "</prosody>",
-      "lstrip": false,
-      "normalized": true,
-      "rstrip": false,
-      "single_word": false,
-      "special": false
-    },
-    "50267": {
-      "content": "<break/>",
-      "lstrip": false,
-      "normalized": true,
-      "rstrip": false,
-      "single_word": false,
-      "special": false
-    },
-    "50268": {
-      "content": "<break>",
-      "lstrip": false,
-      "normalized": true,
-      "rstrip": false,
-      "single_word": false,
-      "special": false
-    }
-  },
-  "bos_token": "<s>",
-  "clean_up_tokenization_spaces": false,
-  "cls_token": "<s>",
-  "eos_token": "</s>",
-  "errors": "replace",
-  "extra_special_tokens": {},
-  "mask_token": "<mask>",
-  "model_max_length": 1000000000000000019884624838656,
-  "pad_token": "<pad>",
-  "sep_token": "</s>",
-  "tokenizer_class": "BartTokenizer",
-  "unk_token": "<unk>"
-}

facebook_bart-base_ssml/checkpoint-120/trainer_state.json DELETED Viewed

@@ -1,41 +0,0 @@
-{
-  "best_global_step": null,
-  "best_metric": null,
-  "best_model_checkpoint": null,
-  "epoch": 15.0,
-  "eval_steps": 500,
-  "global_step": 120,
-  "is_hyper_param_search": false,
-  "is_local_process_zero": true,
-  "is_world_process_zero": true,
-  "log_history": [
-    {
-      "epoch": 12.5,
-      "grad_norm": 225342.25,
-      "learning_rate": 9.9e-06,
-      "loss": 3.4615,
-      "step": 100
-    }
-  ],
-  "logging_steps": 100,
-  "max_steps": 120,
-  "num_input_tokens_seen": 0,
-  "num_train_epochs": 15,
-  "save_steps": 1000,
-  "stateful_callbacks": {
-    "TrainerControl": {
-      "args": {
-        "should_epoch_stop": false,
-        "should_evaluate": false,
-        "should_log": false,
-        "should_save": true,
-        "should_training_stop": true
-      },
-      "attributes": {}
-    }
-  },
-  "total_flos": 525897695232000.0,
-  "train_batch_size": 16,
-  "trial_name": null,
-  "trial_params": null
-}

facebook_bart-base_ssml/checkpoint-120/vocab.json DELETED Viewed

The diff for this file is too large to render. See raw diff

facebook_bart-base_ssml/checkpoint-3/added_tokens.json DELETED Viewed

@@ -1,5 +0,0 @@
-{
-  "</prosody>": 50266,
-  "<break/>": 50267,
-  "<prosody>": 50265
-}

facebook_bart-base_ssml/checkpoint-3/config.json DELETED Viewed

@@ -1,73 +0,0 @@
-{
-  "activation_dropout": 0.1,
-  "activation_function": "gelu",
-  "add_bias_logits": false,
-  "add_final_layer_norm": false,
-  "architectures": [
-    "BartForConditionalGeneration"
-  ],
-  "attention_dropout": 0.1,
-  "bos_token_id": 0,
-  "classif_dropout": 0.1,
-  "classifier_dropout": 0.0,
-  "d_model": 768,
-  "decoder_attention_heads": 12,
-  "decoder_ffn_dim": 3072,
-  "decoder_layerdrop": 0.0,
-  "decoder_layers": 6,
-  "decoder_start_token_id": 2,
-  "dropout": 0.1,
-  "early_stopping": null,
-  "encoder_attention_heads": 12,
-  "encoder_ffn_dim": 3072,
-  "encoder_layerdrop": 0.0,
-  "encoder_layers": 6,
-  "eos_token_id": 2,
-  "forced_eos_token_id": 2,
-  "gradient_checkpointing": false,
-  "id2label": {
-    "0": "LABEL_0",
-    "1": "LABEL_1",
-    "2": "LABEL_2"
-  },
-  "init_std": 0.02,
-  "is_encoder_decoder": true,
-  "label2id": {
-    "LABEL_0": 0,
-    "LABEL_1": 1,
-    "LABEL_2": 2
-  },
-  "max_position_embeddings": 1024,
-  "model_type": "bart",
-  "no_repeat_ngram_size": null,
-  "normalize_before": false,
-  "normalize_embedding": true,
-  "num_beams": null,
-  "num_hidden_layers": 6,
-  "pad_token_id": 1,
-  "scale_embedding": false,
-  "task_specific_params": {
-    "summarization": {
-      "length_penalty": 1.0,
-      "max_length": 128,
-      "min_length": 12,
-      "num_beams": 4
-    },
-    "summarization_cnn": {
-      "length_penalty": 2.0,
-      "max_length": 142,
-      "min_length": 56,
-      "num_beams": 4
-    },
-    "summarization_xsum": {
-      "length_penalty": 1.0,
-      "max_length": 62,
-      "min_length": 11,
-      "num_beams": 6
-    }
-  },
-  "torch_dtype": "float32",
-  "transformers_version": "4.52.2",
-  "use_cache": true,
-  "vocab_size": 50268
-}

facebook_bart-base_ssml/checkpoint-3/generation_config.json DELETED Viewed

@@ -1,13 +0,0 @@
-{
-  "_from_model_config": true,
-  "bos_token_id": 0,
-  "decoder_start_token_id": 2,
-  "early_stopping": true,
-  "eos_token_id": 2,
-  "forced_bos_token_id": 0,
-  "forced_eos_token_id": 2,
-  "no_repeat_ngram_size": 3,
-  "num_beams": 4,
-  "pad_token_id": 1,
-  "transformers_version": "4.52.2"
-}

facebook_bart-base_ssml/checkpoint-3/merges.txt DELETED Viewed

The diff for this file is too large to render. See raw diff

facebook_bart-base_ssml/checkpoint-3/special_tokens_map.json DELETED Viewed

@@ -1,51 +0,0 @@
-{
-  "bos_token": {
-    "content": "<s>",
-    "lstrip": false,
-    "normalized": true,
-    "rstrip": false,
-    "single_word": false
-  },
-  "cls_token": {
-    "content": "<s>",
-    "lstrip": false,
-    "normalized": true,
-    "rstrip": false,
-    "single_word": false
-  },
-  "eos_token": {
-    "content": "</s>",
-    "lstrip": false,
-    "normalized": true,
-    "rstrip": false,
-    "single_word": false
-  },
-  "mask_token": {
-    "content": "<mask>",
-    "lstrip": true,
-    "normalized": true,
-    "rstrip": false,
-    "single_word": false
-  },
-  "pad_token": {
-    "content": "<pad>",
-    "lstrip": false,
-    "normalized": true,
-    "rstrip": false,
-    "single_word": false
-  },
-  "sep_token": {
-    "content": "</s>",
-    "lstrip": false,
-    "normalized": true,
-    "rstrip": false,
-    "single_word": false
-  },
-  "unk_token": {
-    "content": "<unk>",
-    "lstrip": false,
-    "normalized": true,
-    "rstrip": false,
-    "single_word": false
-  }
-}

facebook_bart-base_ssml/checkpoint-3/tokenizer_config.json DELETED Viewed

@@ -1,81 +0,0 @@
-{
-  "add_prefix_space": false,
-  "added_tokens_decoder": {
-    "0": {
-      "content": "<s>",
-      "lstrip": false,
-      "normalized": true,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "1": {
-      "content": "<pad>",
-      "lstrip": false,
-      "normalized": true,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "2": {
-      "content": "</s>",
-      "lstrip": false,
-      "normalized": true,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "3": {
-      "content": "<unk>",
-      "lstrip": false,
-      "normalized": true,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "50264": {
-      "content": "<mask>",
-      "lstrip": true,
-      "normalized": true,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "50265": {
-      "content": "<prosody>",
-      "lstrip": false,
-      "normalized": true,
-      "rstrip": false,
-      "single_word": false,
-      "special": false
-    },
-    "50266": {
-      "content": "</prosody>",
-      "lstrip": false,
-      "normalized": true,
-      "rstrip": false,
-      "single_word": false,
-      "special": false
-    },
-    "50267": {
-      "content": "<break/>",
-      "lstrip": false,
-      "normalized": true,
-      "rstrip": false,
-      "single_word": false,
-      "special": false
-    }
-  },
-  "bos_token": "<s>",
-  "clean_up_tokenization_spaces": false,
-  "cls_token": "<s>",
-  "eos_token": "</s>",
-  "errors": "replace",
-  "extra_special_tokens": {},
-  "mask_token": "<mask>",
-  "model_max_length": 1000000000000000019884624838656,
-  "pad_token": "<pad>",
-  "sep_token": "</s>",
-  "tokenizer_class": "BartTokenizer",
-  "unk_token": "<unk>"
-}

facebook_bart-base_ssml/checkpoint-3/trainer_state.json DELETED Viewed

@@ -1,33 +0,0 @@
-{
-  "best_global_step": null,
-  "best_metric": null,
-  "best_model_checkpoint": null,
-  "epoch": 3.0,
-  "eval_steps": 100,
-  "global_step": 3,
-  "is_hyper_param_search": false,
-  "is_local_process_zero": true,
-  "is_world_process_zero": true,
-  "log_history": [],
-  "logging_steps": 10,
-  "max_steps": 3,
-  "num_input_tokens_seen": 0,
-  "num_train_epochs": 3,
-  "save_steps": 500,
-  "stateful_callbacks": {
-    "TrainerControl": {
-      "args": {
-        "should_epoch_stop": false,
-        "should_evaluate": false,
-        "should_log": false,
-        "should_save": true,
-        "should_training_stop": true
-      },
-      "attributes": {}
-    }
-  },
-  "total_flos": 29267349995520.0,
-  "train_batch_size": 32,
-  "trial_name": null,
-  "trial_params": null
-}

facebook_bart-base_ssml/checkpoint-3/vocab.json DELETED Viewed

The diff for this file is too large to render. See raw diff

facebook_bart-base_ssml/checkpoint-990/added_tokens.json DELETED Viewed

@@ -1,6 +0,0 @@
-{
-  "</prosody>": 50266,
-  "<break/>": 50267,
-  "<break>": 50268,
-  "<prosody>": 50265
-}

facebook_bart-base_ssml/checkpoint-990/config.json DELETED Viewed

@@ -1,73 +0,0 @@
-{
-  "activation_dropout": 0.1,
-  "activation_function": "gelu",
-  "add_bias_logits": false,
-  "add_final_layer_norm": false,
-  "architectures": [
-    "BartForConditionalGeneration"
-  ],
-  "attention_dropout": 0.1,
-  "bos_token_id": 0,
-  "classif_dropout": 0.1,
-  "classifier_dropout": 0.0,
-  "d_model": 768,
-  "decoder_attention_heads": 12,
-  "decoder_ffn_dim": 3072,
-  "decoder_layerdrop": 0.0,
-  "decoder_layers": 6,
-  "decoder_start_token_id": 2,
-  "dropout": 0.1,
-  "early_stopping": null,
-  "encoder_attention_heads": 12,
-  "encoder_ffn_dim": 3072,
-  "encoder_layerdrop": 0.0,
-  "encoder_layers": 6,
-  "eos_token_id": 2,
-  "forced_eos_token_id": 2,
-  "gradient_checkpointing": false,
-  "id2label": {
-    "0": "LABEL_0",
-    "1": "LABEL_1",
-    "2": "LABEL_2"
-  },
-  "init_std": 0.02,
-  "is_encoder_decoder": true,
-  "label2id": {
-    "LABEL_0": 0,
-    "LABEL_1": 1,
-    "LABEL_2": 2
-  },
-  "max_position_embeddings": 1024,
-  "model_type": "bart",
-  "no_repeat_ngram_size": null,
-  "normalize_before": false,
-  "normalize_embedding": true,
-  "num_beams": null,
-  "num_hidden_layers": 6,
-  "pad_token_id": 1,
-  "scale_embedding": false,
-  "task_specific_params": {
-    "summarization": {
-      "length_penalty": 1.0,
-      "max_length": 128,
-      "min_length": 12,
-      "num_beams": 4
-    },
-    "summarization_cnn": {
-      "length_penalty": 2.0,
-      "max_length": 142,
-      "min_length": 56,
-      "num_beams": 4
-    },
-    "summarization_xsum": {
-      "length_penalty": 1.0,
-      "max_length": 62,
-      "min_length": 11,
-      "num_beams": 6
-    }
-  },
-  "torch_dtype": "float32",
-  "transformers_version": "4.52.2",
-  "use_cache": true,
-  "vocab_size": 50269
-}

facebook_bart-base_ssml/checkpoint-990/generation_config.json DELETED Viewed

@@ -1,13 +0,0 @@
-{
-  "_from_model_config": true,
-  "bos_token_id": 0,
-  "decoder_start_token_id": 2,
-  "early_stopping": true,
-  "eos_token_id": 2,
-  "forced_bos_token_id": 0,
-  "forced_eos_token_id": 2,
-  "no_repeat_ngram_size": 3,
-  "num_beams": 4,
-  "pad_token_id": 1,
-  "transformers_version": "4.52.2"
-}

facebook_bart-base_ssml/checkpoint-990/merges.txt DELETED Viewed

The diff for this file is too large to render. See raw diff

facebook_bart-base_ssml/checkpoint-990/special_tokens_map.json DELETED Viewed

@@ -1,51 +0,0 @@
-{
-  "bos_token": {
-    "content": "<s>",
-    "lstrip": false,
-    "normalized": true,
-    "rstrip": false,
-    "single_word": false
-  },
-  "cls_token": {
-    "content": "<s>",
-    "lstrip": false,
-    "normalized": true,
-    "rstrip": false,
-    "single_word": false
-  },
-  "eos_token": {
-    "content": "</s>",
-    "lstrip": false,
-    "normalized": true,
-    "rstrip": false,
-    "single_word": false
-  },
-  "mask_token": {
-    "content": "<mask>",
-    "lstrip": true,
-    "normalized": true,
-    "rstrip": false,
-    "single_word": false
-  },
-  "pad_token": {
-    "content": "<pad>",
-    "lstrip": false,
-    "normalized": true,
-    "rstrip": false,
-    "single_word": false
-  },
-  "sep_token": {
-    "content": "</s>",
-    "lstrip": false,
-    "normalized": true,
-    "rstrip": false,
-    "single_word": false
-  },
-  "unk_token": {
-    "content": "<unk>",
-    "lstrip": false,
-    "normalized": true,
-    "rstrip": false,
-    "single_word": false
-  }
-}

facebook_bart-base_ssml/checkpoint-990/tokenizer_config.json DELETED Viewed

@@ -1,89 +0,0 @@
-{
-  "add_prefix_space": false,
-  "added_tokens_decoder": {
-    "0": {
-      "content": "<s>",
-      "lstrip": false,
-      "normalized": true,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "1": {
-      "content": "<pad>",
-      "lstrip": false,
-      "normalized": true,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "2": {
-      "content": "</s>",
-      "lstrip": false,
-      "normalized": true,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "3": {
-      "content": "<unk>",
-      "lstrip": false,
-      "normalized": true,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "50264": {
-      "content": "<mask>",
-      "lstrip": true,
-      "normalized": true,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "50265": {
-      "content": "<prosody>",
-      "lstrip": false,
-      "normalized": true,
-      "rstrip": false,
-      "single_word": false,
-      "special": false
-    },
-    "50266": {
-      "content": "</prosody>",
-      "lstrip": false,
-      "normalized": true,
-      "rstrip": false,
-      "single_word": false,
-      "special": false
-    },
-    "50267": {
-      "content": "<break/>",
-      "lstrip": false,
-      "normalized": true,
-      "rstrip": false,
-      "single_word": false,
-      "special": false
-    },
-    "50268": {
-      "content": "<break>",
-      "lstrip": false,
-      "normalized": true,
-      "rstrip": false,
-      "single_word": false,
-      "special": false
-    }
-  },
-  "bos_token": "<s>",
-  "clean_up_tokenization_spaces": false,
-  "cls_token": "<s>",
-  "eos_token": "</s>",
-  "errors": "replace",
-  "extra_special_tokens": {},
-  "mask_token": "<mask>",
-  "model_max_length": 1000000000000000019884624838656,
-  "pad_token": "<pad>",
-  "sep_token": "</s>",
-  "tokenizer_class": "BartTokenizer",
-  "unk_token": "<unk>"
-}

facebook_bart-base_ssml/checkpoint-990/trainer_state.json DELETED Viewed

@@ -1,33 +0,0 @@
-{
-  "best_global_step": null,
-  "best_metric": null,
-  "best_model_checkpoint": null,
-  "epoch": 15.0,
-  "eval_steps": 500000,
-  "global_step": 990,
-  "is_hyper_param_search": false,
-  "is_local_process_zero": true,
-  "is_world_process_zero": true,
-  "log_history": [],
-  "logging_steps": 100000,
-  "max_steps": 990,
-  "num_input_tokens_seen": 0,
-  "num_train_epochs": 15,
-  "save_steps": 1000000,
-  "stateful_callbacks": {
-    "TrainerControl": {
-      "args": {
-        "should_epoch_stop": false,
-        "should_evaluate": false,
-        "should_log": false,
-        "should_save": true,
-        "should_training_stop": true
-      },
-      "attributes": {}
-    }
-  },
-  "total_flos": 2478149981798400.0,
-  "train_batch_size": 16,
-  "trial_name": null,
-  "trial_params": null
-}

facebook_bart-base_ssml/checkpoint-990/vocab.json DELETED Viewed

The diff for this file is too large to render. See raw diff