Eldar Kurtic commited on
Commit
8a48b80
·
1 Parent(s): bfd38b1
README.md ADDED
@@ -0,0 +1,216 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ tags:
6
+ - moe
7
+ - w4a16
8
+ - int4
9
+ - vllm
10
+ ---
11
+
12
+ # Mixtral-8x22B-v0.1-quantized.w4a16
13
+
14
+ ## Model Overview
15
+ - **Model Architecture:** Mixtral-8x22B-v0.1
16
+ - **Input:** Text
17
+ - **Output:** Text
18
+ - **Model Optimizations:**
19
+ - **Weight quantization:** INT4
20
+ - **Activation quantization:** None
21
+ - **Release Date:** 3/1/2025
22
+ - **Version:** 1.0
23
+ - **Model Developers:** Neural Magic
24
+
25
+ Quantized version of [Mixtral-8x22B-v0.1](https://huggingface.co/mistralai/Mixtral-8x22B-v0.1).
26
+ It achieves an average score of 74.17 on the [OpenLLM](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) benchmark (version 1), whereas the unquantized model achieves 74.69.
27
+
28
+ ### Model Optimizations
29
+
30
+ This model was obtained by only quantizing the weights to INT4 data type, ready for inference with vLLM >= 0.5.2.
31
+ This optimization reduces the number of bits per parameter from 16 to 4, reducing the disk size and GPU memory requirements by approximately 75%. Only the weights of the linear operators within transformers blocks are quantized, except the MLP routers.
32
+
33
+ ## Deployment
34
+
35
+ ### Use with vLLM
36
+
37
+ This model can be deployed efficiently using the [vLLM](https://docs.vllm.ai/en/latest/) backend, as shown in the example below.
38
+
39
+ ```python
40
+ from transformers import AutoTokenizer
41
+ from vllm import LLM, SamplingParams
42
+
43
+ max_model_len, tp_size = 4096, 4
44
+ model_name = "neuralmagic-ent/Mixtral-8x22B-v0.1-quantized.w4a16"
45
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
46
+ llm = LLM(model=model_name, tensor_parallel_size=tp_size, max_model_len=max_model_len, trust_remote_code=True)
47
+ sampling_params = SamplingParams(temperature=0.3, max_tokens=256, stop_token_ids=[tokenizer.eos_token_id])
48
+
49
+ messages_list = [
50
+ [{"role": "user", "content": "Who are you? Please respond in pirate speak!"}],
51
+ ]
52
+
53
+ prompt_token_ids = [tokenizer.apply_chat_template(messages, add_generation_prompt=True) for messages in messages_list]
54
+
55
+ outputs = llm.generate(prompt_token_ids=prompt_token_ids, sampling_params=sampling_params)
56
+
57
+ generated_text = [output.outputs[0].text for output in outputs]
58
+ print(generated_text)
59
+ ```
60
+
61
+ vLLM also supports OpenAI-compatible serving. See the [documentation](https://docs.vllm.ai/en/latest/) for more details.
62
+
63
+ ## Creation
64
+
65
+ This model was created with [llm-compressor](https://github.com/vllm-project/llm-compressor) by running the code snippet below with the following command:
66
+
67
+ ```bash
68
+ python quantize.py --model_path mistralai/Mixtral-8x22B-v0.1 --quant_path "output_dir" --calib_size 1024 --dampening_frac 0.1 --observer minmax --actorder False
69
+ ```
70
+
71
+
72
+ ```python
73
+ from datasets import load_dataset
74
+ from transformers import AutoTokenizer
75
+ from llmcompressor.modifiers.quantization import GPTQModifier
76
+ from llmcompressor.transformers import SparseAutoModelForCausalLM, oneshot, apply
77
+ import argparse
78
+ from compressed_tensors.quantization import QuantizationScheme, QuantizationArgs, QuantizationType, QuantizationStrategy
79
+ from llmcompressor.transformers.compression.helpers import calculate_offload_device_map
80
+ import torch
81
+
82
+ def parse_actorder(value):
83
+ # Interpret the input value for --actorder
84
+ if value.lower() == "false":
85
+ return False
86
+ elif value.lower() == "group":
87
+ return "group"
88
+ else:
89
+ raise argparse.ArgumentTypeError("Invalid value for --actorder. Use 'group' or 'False'.")
90
+
91
+
92
+ parser = argparse.ArgumentParser()
93
+ parser.add_argument('--model_path', type=str)
94
+ parser.add_argument('--quant_path', type=str)
95
+ parser.add_argument('--num_bits', type=int, default=4)
96
+ parser.add_argument('--sequential_update', type=bool, default=True)
97
+ parser.add_argument('--calib_size', type=int, default=256)
98
+ parser.add_argument('--dampening_frac', type=float, default=0.05)
99
+ parser.add_argument('--observer', type=str, default="minmax")
100
+ parser.add_argument(
101
+ '--actorder',
102
+ type=parse_actorder,
103
+ default=False, # Default value is False
104
+ help="Specify actorder as 'group' (string) or False (boolean)."
105
+ )
106
+
107
+ args = parser.parse_args()
108
+
109
+ device_map = calculate_offload_device_map(
110
+ args.model_path,
111
+ reserve_for_hessians=True,
112
+ num_gpus=torch.cuda.device_count(),
113
+ torch_dtype=torch.bfloat16,
114
+ trust_remote_code=True,
115
+ )
116
+
117
+ model = SparseAutoModelForCausalLM.from_pretrained(
118
+ args.model_path,
119
+ device_map=device_map,
120
+ torch_dtype=torch.bfloat16,
121
+ use_cache=False,
122
+ )
123
+ tokenizer = AutoTokenizer.from_pretrained(args.model_path)
124
+
125
+ NUM_CALIBRATION_SAMPLES = args.calib_size
126
+ DATASET_ID = "garage-bAInd/Open-Platypus"
127
+ DATASET_SPLIT = "train"
128
+ ds = load_dataset(DATASET_ID, split=DATASET_SPLIT)
129
+ ds = ds.shuffle(seed=42).select(range(NUM_CALIBRATION_SAMPLES))
130
+
131
+ def preprocess(example):
132
+ concat_txt = example["instruction"] + "\n" + example["output"]
133
+ return {"text": concat_txt}
134
+
135
+ ds = ds.map(preprocess)
136
+
137
+ def tokenize(sample):
138
+ return tokenizer(
139
+ sample["text"],
140
+ padding=False,
141
+ truncation=False,
142
+ add_special_tokens=True,
143
+ )
144
+
145
+
146
+ ds = ds.map(tokenize, remove_columns=ds.column_names)
147
+
148
+ quant_scheme = QuantizationScheme(
149
+ targets=["Linear"],
150
+ weights=QuantizationArgs(
151
+ num_bits=args.num_bits,
152
+ type=QuantizationType.INT,
153
+ symmetric=True,
154
+ group_size=128,
155
+ strategy=QuantizationStrategy.GROUP,
156
+ observer=args.observer,
157
+ actorder=args.actorder
158
+ ),
159
+ input_activations=None,
160
+ output_activations=None,
161
+ )
162
+
163
+ recipe = [
164
+ GPTQModifier(
165
+ targets=["Linear"],
166
+ ignore=["lm_head", "re:.*block_sparse_moe.gate"],
167
+ sequential_update=args.sequential_update,
168
+ dampening_frac=args.dampening_frac,
169
+ config_groups={"group_0": quant_scheme},
170
+ )
171
+ ]
172
+ oneshot(
173
+ model=model,
174
+ dataset=ds,
175
+ recipe=recipe,
176
+ num_calibration_samples=args.calib_size,
177
+ )
178
+
179
+ # Save to disk compressed.
180
+ SAVE_DIR = args.quant_path
181
+ model.save_pretrained(SAVE_DIR, save_compressed=True)
182
+ tokenizer.save_pretrained(SAVE_DIR)
183
+ ```
184
+
185
+ ## Evaluation
186
+
187
+ The model was evaluated on OpenLLM Leaderboard [V1](https://huggingface.co/spaces/open-llm-leaderboard-old/open_llm_leaderboard) using the following command:
188
+
189
+ OpenLLM Leaderboard V1:
190
+ ```
191
+ lm_eval \
192
+ --model vllm \
193
+ --model_args pretrained="neuralmagic-ent/Mixtral-8x22B-v0.1-quantized.w4a16",dtype=auto,add_bos_token=True,max_model_len=4096,tensor_parallel_size=4,gpu_memory_utilization=0.8,enable_chunked_prefill=True,trust_remote_code=True \
194
+ --tasks openllm \
195
+ --write_out \
196
+ --batch_size auto \
197
+ --output_path output_dir \
198
+ --show_config
199
+ ```
200
+
201
+
202
+ ### Accuracy
203
+
204
+ #### OpenLLM Leaderboard V1 evaluation scores
205
+
206
+ | Metric | mistralai/Mixtral-8x22B-v0.1 | neuralmagic-ent/Mixtral-8x22B-v0.1-quantized.w4a16 |
207
+ |-----------------------------------------|:---------------------------------:|:-------------------------------------------:|
208
+ | ARC-Challenge (Acc-Norm, 25-shot) | 70.39 | 69.88 |
209
+ | GSM8K (Strict-Match, 5-shot) | 76.42 | 74.68 |
210
+ | HellaSwag (Acc-Norm, 10-shot) | 88.31 | 87.94 |
211
+ | MMLU (Acc, 5-shot) | 77.40 | 76.21 |
212
+ | TruthfulQA (MC2, 0-shot) | 51.17 | 51.15 |
213
+ | Winogrande (Acc, 5-shot) | 84.45 | 85.16 |
214
+ | **Average Score** | **74.69** | **74.17** |
215
+ | **Recovery** | **100.00** | **99.30** |
216
+
config.json ADDED
@@ -0,0 +1,1754 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "mistralai/Mixtral-8x22B-v0.1",
3
+ "architectures": [
4
+ "MixtralForCausalLM"
5
+ ],
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 1,
8
+ "eos_token_id": 2,
9
+ "head_dim": 128,
10
+ "hidden_act": "silu",
11
+ "hidden_size": 6144,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 16384,
14
+ "max_position_embeddings": 65536,
15
+ "model_type": "mixtral",
16
+ "num_attention_heads": 48,
17
+ "num_experts_per_tok": 2,
18
+ "num_hidden_layers": 56,
19
+ "num_key_value_heads": 8,
20
+ "num_local_experts": 8,
21
+ "output_router_logits": false,
22
+ "quantization_config": {
23
+ "config_groups": {
24
+ "group_0": {
25
+ "input_activations": null,
26
+ "output_activations": null,
27
+ "targets": [
28
+ "Linear"
29
+ ],
30
+ "weights": {
31
+ "actorder": null,
32
+ "block_structure": null,
33
+ "dynamic": false,
34
+ "group_size": null,
35
+ "num_bits": 4,
36
+ "observer": "minmax",
37
+ "observer_kwargs": {},
38
+ "strategy": "channel",
39
+ "symmetric": true,
40
+ "type": "int"
41
+ }
42
+ }
43
+ },
44
+ "format": "pack-quantized",
45
+ "global_compression_ratio": 2.102105512128843,
46
+ "ignore": [
47
+ "model.layers.0.block_sparse_moe.gate",
48
+ "model.layers.1.block_sparse_moe.gate",
49
+ "model.layers.2.block_sparse_moe.gate",
50
+ "model.layers.3.block_sparse_moe.gate",
51
+ "model.layers.4.block_sparse_moe.gate",
52
+ "model.layers.5.block_sparse_moe.gate",
53
+ "model.layers.6.block_sparse_moe.gate",
54
+ "model.layers.7.block_sparse_moe.gate",
55
+ "model.layers.8.block_sparse_moe.gate",
56
+ "model.layers.9.block_sparse_moe.gate",
57
+ "model.layers.10.block_sparse_moe.gate",
58
+ "model.layers.11.block_sparse_moe.gate",
59
+ "model.layers.12.block_sparse_moe.gate",
60
+ "model.layers.13.block_sparse_moe.gate",
61
+ "model.layers.14.block_sparse_moe.gate",
62
+ "model.layers.15.block_sparse_moe.gate",
63
+ "model.layers.16.block_sparse_moe.gate",
64
+ "model.layers.17.block_sparse_moe.gate",
65
+ "model.layers.18.block_sparse_moe.gate",
66
+ "model.layers.19.block_sparse_moe.gate",
67
+ "model.layers.20.block_sparse_moe.gate",
68
+ "model.layers.21.block_sparse_moe.gate",
69
+ "model.layers.22.block_sparse_moe.gate",
70
+ "model.layers.23.block_sparse_moe.gate",
71
+ "model.layers.24.block_sparse_moe.gate",
72
+ "model.layers.25.block_sparse_moe.gate",
73
+ "model.layers.26.block_sparse_moe.gate",
74
+ "model.layers.27.block_sparse_moe.gate",
75
+ "model.layers.28.block_sparse_moe.gate",
76
+ "model.layers.29.block_sparse_moe.gate",
77
+ "model.layers.30.block_sparse_moe.gate",
78
+ "model.layers.31.block_sparse_moe.gate",
79
+ "model.layers.32.block_sparse_moe.gate",
80
+ "model.layers.33.block_sparse_moe.gate",
81
+ "model.layers.34.block_sparse_moe.gate",
82
+ "model.layers.35.block_sparse_moe.gate",
83
+ "model.layers.36.block_sparse_moe.gate",
84
+ "model.layers.37.block_sparse_moe.gate",
85
+ "model.layers.38.block_sparse_moe.gate",
86
+ "model.layers.39.block_sparse_moe.gate",
87
+ "model.layers.40.block_sparse_moe.gate",
88
+ "model.layers.41.block_sparse_moe.gate",
89
+ "model.layers.42.block_sparse_moe.gate",
90
+ "model.layers.43.block_sparse_moe.gate",
91
+ "model.layers.44.block_sparse_moe.gate",
92
+ "model.layers.45.block_sparse_moe.gate",
93
+ "model.layers.46.block_sparse_moe.gate",
94
+ "model.layers.47.block_sparse_moe.gate",
95
+ "model.layers.48.block_sparse_moe.gate",
96
+ "model.layers.49.block_sparse_moe.gate",
97
+ "model.layers.50.block_sparse_moe.gate",
98
+ "model.layers.51.block_sparse_moe.gate",
99
+ "model.layers.52.block_sparse_moe.gate",
100
+ "model.layers.53.block_sparse_moe.gate",
101
+ "model.layers.54.block_sparse_moe.gate",
102
+ "model.layers.55.block_sparse_moe.gate",
103
+ "lm_head"
104
+ ],
105
+ "kv_cache_scheme": null,
106
+ "quant_method": "compressed-tensors",
107
+ "quantization_status": "compressed",
108
+ "sparsity_config": {
109
+ "format": "dense",
110
+ "global_sparsity": 0.21683202346977445,
111
+ "ignore": [
112
+ "model.layers.0.self_attn.o_proj",
113
+ "model.layers.0.block_sparse_moe.gate",
114
+ "model.layers.0.block_sparse_moe.experts.0.w1",
115
+ "model.layers.0.block_sparse_moe.experts.0.w2",
116
+ "model.layers.0.block_sparse_moe.experts.0.w3",
117
+ "model.layers.0.block_sparse_moe.experts.1.w1",
118
+ "model.layers.0.block_sparse_moe.experts.1.w2",
119
+ "model.layers.0.block_sparse_moe.experts.1.w3",
120
+ "model.layers.0.block_sparse_moe.experts.2.w1",
121
+ "model.layers.0.block_sparse_moe.experts.2.w2",
122
+ "model.layers.0.block_sparse_moe.experts.2.w3",
123
+ "model.layers.0.block_sparse_moe.experts.3.w1",
124
+ "model.layers.0.block_sparse_moe.experts.3.w2",
125
+ "model.layers.0.block_sparse_moe.experts.3.w3",
126
+ "model.layers.0.block_sparse_moe.experts.4.w1",
127
+ "model.layers.0.block_sparse_moe.experts.4.w2",
128
+ "model.layers.0.block_sparse_moe.experts.4.w3",
129
+ "model.layers.0.block_sparse_moe.experts.5.w1",
130
+ "model.layers.0.block_sparse_moe.experts.5.w2",
131
+ "model.layers.0.block_sparse_moe.experts.5.w3",
132
+ "model.layers.0.block_sparse_moe.experts.6.w1",
133
+ "model.layers.0.block_sparse_moe.experts.6.w2",
134
+ "model.layers.0.block_sparse_moe.experts.6.w3",
135
+ "model.layers.0.block_sparse_moe.experts.7.w1",
136
+ "model.layers.0.block_sparse_moe.experts.7.w2",
137
+ "model.layers.0.block_sparse_moe.experts.7.w3",
138
+ "model.layers.1.self_attn.v_proj",
139
+ "model.layers.1.self_attn.o_proj",
140
+ "model.layers.1.block_sparse_moe.gate",
141
+ "model.layers.1.block_sparse_moe.experts.0.w1",
142
+ "model.layers.1.block_sparse_moe.experts.0.w2",
143
+ "model.layers.1.block_sparse_moe.experts.0.w3",
144
+ "model.layers.1.block_sparse_moe.experts.1.w1",
145
+ "model.layers.1.block_sparse_moe.experts.1.w2",
146
+ "model.layers.1.block_sparse_moe.experts.1.w3",
147
+ "model.layers.1.block_sparse_moe.experts.2.w1",
148
+ "model.layers.1.block_sparse_moe.experts.2.w2",
149
+ "model.layers.1.block_sparse_moe.experts.2.w3",
150
+ "model.layers.1.block_sparse_moe.experts.3.w1",
151
+ "model.layers.1.block_sparse_moe.experts.3.w2",
152
+ "model.layers.1.block_sparse_moe.experts.3.w3",
153
+ "model.layers.1.block_sparse_moe.experts.4.w1",
154
+ "model.layers.1.block_sparse_moe.experts.4.w2",
155
+ "model.layers.1.block_sparse_moe.experts.4.w3",
156
+ "model.layers.1.block_sparse_moe.experts.5.w1",
157
+ "model.layers.1.block_sparse_moe.experts.5.w2",
158
+ "model.layers.1.block_sparse_moe.experts.5.w3",
159
+ "model.layers.1.block_sparse_moe.experts.6.w1",
160
+ "model.layers.1.block_sparse_moe.experts.6.w2",
161
+ "model.layers.1.block_sparse_moe.experts.6.w3",
162
+ "model.layers.1.block_sparse_moe.experts.7.w1",
163
+ "model.layers.1.block_sparse_moe.experts.7.w2",
164
+ "model.layers.1.block_sparse_moe.experts.7.w3",
165
+ "model.layers.2.self_attn.q_proj",
166
+ "model.layers.2.self_attn.k_proj",
167
+ "model.layers.2.self_attn.v_proj",
168
+ "model.layers.2.self_attn.o_proj",
169
+ "model.layers.2.block_sparse_moe.gate",
170
+ "model.layers.2.block_sparse_moe.experts.0.w1",
171
+ "model.layers.2.block_sparse_moe.experts.0.w2",
172
+ "model.layers.2.block_sparse_moe.experts.0.w3",
173
+ "model.layers.2.block_sparse_moe.experts.1.w1",
174
+ "model.layers.2.block_sparse_moe.experts.1.w2",
175
+ "model.layers.2.block_sparse_moe.experts.1.w3",
176
+ "model.layers.2.block_sparse_moe.experts.2.w1",
177
+ "model.layers.2.block_sparse_moe.experts.2.w2",
178
+ "model.layers.2.block_sparse_moe.experts.2.w3",
179
+ "model.layers.2.block_sparse_moe.experts.3.w1",
180
+ "model.layers.2.block_sparse_moe.experts.3.w2",
181
+ "model.layers.2.block_sparse_moe.experts.3.w3",
182
+ "model.layers.2.block_sparse_moe.experts.4.w1",
183
+ "model.layers.2.block_sparse_moe.experts.4.w2",
184
+ "model.layers.2.block_sparse_moe.experts.4.w3",
185
+ "model.layers.2.block_sparse_moe.experts.5.w1",
186
+ "model.layers.2.block_sparse_moe.experts.5.w2",
187
+ "model.layers.2.block_sparse_moe.experts.5.w3",
188
+ "model.layers.2.block_sparse_moe.experts.6.w1",
189
+ "model.layers.2.block_sparse_moe.experts.6.w2",
190
+ "model.layers.2.block_sparse_moe.experts.6.w3",
191
+ "model.layers.2.block_sparse_moe.experts.7.w1",
192
+ "model.layers.2.block_sparse_moe.experts.7.w2",
193
+ "model.layers.2.block_sparse_moe.experts.7.w3",
194
+ "model.layers.3.self_attn.q_proj",
195
+ "model.layers.3.self_attn.k_proj",
196
+ "model.layers.3.self_attn.v_proj",
197
+ "model.layers.3.self_attn.o_proj",
198
+ "model.layers.3.block_sparse_moe.gate",
199
+ "model.layers.3.block_sparse_moe.experts.0.w1",
200
+ "model.layers.3.block_sparse_moe.experts.0.w2",
201
+ "model.layers.3.block_sparse_moe.experts.0.w3",
202
+ "model.layers.3.block_sparse_moe.experts.1.w1",
203
+ "model.layers.3.block_sparse_moe.experts.1.w2",
204
+ "model.layers.3.block_sparse_moe.experts.1.w3",
205
+ "model.layers.3.block_sparse_moe.experts.2.w1",
206
+ "model.layers.3.block_sparse_moe.experts.2.w2",
207
+ "model.layers.3.block_sparse_moe.experts.2.w3",
208
+ "model.layers.3.block_sparse_moe.experts.3.w1",
209
+ "model.layers.3.block_sparse_moe.experts.3.w2",
210
+ "model.layers.3.block_sparse_moe.experts.3.w3",
211
+ "model.layers.3.block_sparse_moe.experts.4.w1",
212
+ "model.layers.3.block_sparse_moe.experts.4.w2",
213
+ "model.layers.3.block_sparse_moe.experts.4.w3",
214
+ "model.layers.3.block_sparse_moe.experts.5.w1",
215
+ "model.layers.3.block_sparse_moe.experts.5.w2",
216
+ "model.layers.3.block_sparse_moe.experts.5.w3",
217
+ "model.layers.3.block_sparse_moe.experts.6.w1",
218
+ "model.layers.3.block_sparse_moe.experts.6.w2",
219
+ "model.layers.3.block_sparse_moe.experts.6.w3",
220
+ "model.layers.3.block_sparse_moe.experts.7.w1",
221
+ "model.layers.3.block_sparse_moe.experts.7.w2",
222
+ "model.layers.3.block_sparse_moe.experts.7.w3",
223
+ "model.layers.4.self_attn.q_proj",
224
+ "model.layers.4.self_attn.k_proj",
225
+ "model.layers.4.self_attn.v_proj",
226
+ "model.layers.4.self_attn.o_proj",
227
+ "model.layers.4.block_sparse_moe.gate",
228
+ "model.layers.4.block_sparse_moe.experts.0.w1",
229
+ "model.layers.4.block_sparse_moe.experts.0.w2",
230
+ "model.layers.4.block_sparse_moe.experts.0.w3",
231
+ "model.layers.4.block_sparse_moe.experts.1.w1",
232
+ "model.layers.4.block_sparse_moe.experts.1.w2",
233
+ "model.layers.4.block_sparse_moe.experts.1.w3",
234
+ "model.layers.4.block_sparse_moe.experts.2.w1",
235
+ "model.layers.4.block_sparse_moe.experts.2.w2",
236
+ "model.layers.4.block_sparse_moe.experts.2.w3",
237
+ "model.layers.4.block_sparse_moe.experts.3.w1",
238
+ "model.layers.4.block_sparse_moe.experts.3.w2",
239
+ "model.layers.4.block_sparse_moe.experts.3.w3",
240
+ "model.layers.4.block_sparse_moe.experts.4.w1",
241
+ "model.layers.4.block_sparse_moe.experts.4.w2",
242
+ "model.layers.4.block_sparse_moe.experts.4.w3",
243
+ "model.layers.4.block_sparse_moe.experts.5.w1",
244
+ "model.layers.4.block_sparse_moe.experts.5.w2",
245
+ "model.layers.4.block_sparse_moe.experts.5.w3",
246
+ "model.layers.4.block_sparse_moe.experts.6.w1",
247
+ "model.layers.4.block_sparse_moe.experts.6.w2",
248
+ "model.layers.4.block_sparse_moe.experts.6.w3",
249
+ "model.layers.4.block_sparse_moe.experts.7.w1",
250
+ "model.layers.4.block_sparse_moe.experts.7.w2",
251
+ "model.layers.4.block_sparse_moe.experts.7.w3",
252
+ "model.layers.5.self_attn.q_proj",
253
+ "model.layers.5.self_attn.k_proj",
254
+ "model.layers.5.self_attn.v_proj",
255
+ "model.layers.5.self_attn.o_proj",
256
+ "model.layers.5.block_sparse_moe.gate",
257
+ "model.layers.5.block_sparse_moe.experts.0.w1",
258
+ "model.layers.5.block_sparse_moe.experts.0.w2",
259
+ "model.layers.5.block_sparse_moe.experts.0.w3",
260
+ "model.layers.5.block_sparse_moe.experts.1.w1",
261
+ "model.layers.5.block_sparse_moe.experts.1.w2",
262
+ "model.layers.5.block_sparse_moe.experts.1.w3",
263
+ "model.layers.5.block_sparse_moe.experts.2.w1",
264
+ "model.layers.5.block_sparse_moe.experts.2.w2",
265
+ "model.layers.5.block_sparse_moe.experts.2.w3",
266
+ "model.layers.5.block_sparse_moe.experts.3.w1",
267
+ "model.layers.5.block_sparse_moe.experts.3.w2",
268
+ "model.layers.5.block_sparse_moe.experts.3.w3",
269
+ "model.layers.5.block_sparse_moe.experts.4.w1",
270
+ "model.layers.5.block_sparse_moe.experts.4.w2",
271
+ "model.layers.5.block_sparse_moe.experts.4.w3",
272
+ "model.layers.5.block_sparse_moe.experts.5.w1",
273
+ "model.layers.5.block_sparse_moe.experts.5.w2",
274
+ "model.layers.5.block_sparse_moe.experts.5.w3",
275
+ "model.layers.5.block_sparse_moe.experts.6.w1",
276
+ "model.layers.5.block_sparse_moe.experts.6.w2",
277
+ "model.layers.5.block_sparse_moe.experts.6.w3",
278
+ "model.layers.5.block_sparse_moe.experts.7.w1",
279
+ "model.layers.5.block_sparse_moe.experts.7.w2",
280
+ "model.layers.5.block_sparse_moe.experts.7.w3",
281
+ "model.layers.6.self_attn.q_proj",
282
+ "model.layers.6.self_attn.k_proj",
283
+ "model.layers.6.self_attn.v_proj",
284
+ "model.layers.6.self_attn.o_proj",
285
+ "model.layers.6.block_sparse_moe.gate",
286
+ "model.layers.6.block_sparse_moe.experts.0.w1",
287
+ "model.layers.6.block_sparse_moe.experts.0.w2",
288
+ "model.layers.6.block_sparse_moe.experts.0.w3",
289
+ "model.layers.6.block_sparse_moe.experts.1.w1",
290
+ "model.layers.6.block_sparse_moe.experts.1.w2",
291
+ "model.layers.6.block_sparse_moe.experts.1.w3",
292
+ "model.layers.6.block_sparse_moe.experts.2.w1",
293
+ "model.layers.6.block_sparse_moe.experts.2.w2",
294
+ "model.layers.6.block_sparse_moe.experts.2.w3",
295
+ "model.layers.6.block_sparse_moe.experts.3.w1",
296
+ "model.layers.6.block_sparse_moe.experts.3.w2",
297
+ "model.layers.6.block_sparse_moe.experts.3.w3",
298
+ "model.layers.6.block_sparse_moe.experts.4.w1",
299
+ "model.layers.6.block_sparse_moe.experts.4.w2",
300
+ "model.layers.6.block_sparse_moe.experts.4.w3",
301
+ "model.layers.6.block_sparse_moe.experts.5.w1",
302
+ "model.layers.6.block_sparse_moe.experts.5.w2",
303
+ "model.layers.6.block_sparse_moe.experts.5.w3",
304
+ "model.layers.6.block_sparse_moe.experts.6.w1",
305
+ "model.layers.6.block_sparse_moe.experts.6.w2",
306
+ "model.layers.6.block_sparse_moe.experts.6.w3",
307
+ "model.layers.6.block_sparse_moe.experts.7.w1",
308
+ "model.layers.6.block_sparse_moe.experts.7.w2",
309
+ "model.layers.6.block_sparse_moe.experts.7.w3",
310
+ "model.layers.7.self_attn.q_proj",
311
+ "model.layers.7.self_attn.k_proj",
312
+ "model.layers.7.self_attn.v_proj",
313
+ "model.layers.7.self_attn.o_proj",
314
+ "model.layers.7.block_sparse_moe.gate",
315
+ "model.layers.7.block_sparse_moe.experts.0.w1",
316
+ "model.layers.7.block_sparse_moe.experts.0.w2",
317
+ "model.layers.7.block_sparse_moe.experts.0.w3",
318
+ "model.layers.7.block_sparse_moe.experts.1.w1",
319
+ "model.layers.7.block_sparse_moe.experts.1.w2",
320
+ "model.layers.7.block_sparse_moe.experts.1.w3",
321
+ "model.layers.7.block_sparse_moe.experts.2.w1",
322
+ "model.layers.7.block_sparse_moe.experts.2.w2",
323
+ "model.layers.7.block_sparse_moe.experts.2.w3",
324
+ "model.layers.7.block_sparse_moe.experts.3.w1",
325
+ "model.layers.7.block_sparse_moe.experts.3.w2",
326
+ "model.layers.7.block_sparse_moe.experts.3.w3",
327
+ "model.layers.7.block_sparse_moe.experts.4.w1",
328
+ "model.layers.7.block_sparse_moe.experts.4.w2",
329
+ "model.layers.7.block_sparse_moe.experts.4.w3",
330
+ "model.layers.7.block_sparse_moe.experts.5.w1",
331
+ "model.layers.7.block_sparse_moe.experts.5.w2",
332
+ "model.layers.7.block_sparse_moe.experts.5.w3",
333
+ "model.layers.7.block_sparse_moe.experts.6.w1",
334
+ "model.layers.7.block_sparse_moe.experts.6.w2",
335
+ "model.layers.7.block_sparse_moe.experts.6.w3",
336
+ "model.layers.7.block_sparse_moe.experts.7.w1",
337
+ "model.layers.7.block_sparse_moe.experts.7.w2",
338
+ "model.layers.7.block_sparse_moe.experts.7.w3",
339
+ "model.layers.8.self_attn.q_proj",
340
+ "model.layers.8.self_attn.k_proj",
341
+ "model.layers.8.self_attn.v_proj",
342
+ "model.layers.8.self_attn.o_proj",
343
+ "model.layers.8.block_sparse_moe.gate",
344
+ "model.layers.8.block_sparse_moe.experts.0.w1",
345
+ "model.layers.8.block_sparse_moe.experts.0.w2",
346
+ "model.layers.8.block_sparse_moe.experts.0.w3",
347
+ "model.layers.8.block_sparse_moe.experts.1.w1",
348
+ "model.layers.8.block_sparse_moe.experts.1.w2",
349
+ "model.layers.8.block_sparse_moe.experts.1.w3",
350
+ "model.layers.8.block_sparse_moe.experts.2.w1",
351
+ "model.layers.8.block_sparse_moe.experts.2.w2",
352
+ "model.layers.8.block_sparse_moe.experts.2.w3",
353
+ "model.layers.8.block_sparse_moe.experts.3.w1",
354
+ "model.layers.8.block_sparse_moe.experts.3.w2",
355
+ "model.layers.8.block_sparse_moe.experts.3.w3",
356
+ "model.layers.8.block_sparse_moe.experts.4.w1",
357
+ "model.layers.8.block_sparse_moe.experts.4.w2",
358
+ "model.layers.8.block_sparse_moe.experts.4.w3",
359
+ "model.layers.8.block_sparse_moe.experts.5.w1",
360
+ "model.layers.8.block_sparse_moe.experts.5.w2",
361
+ "model.layers.8.block_sparse_moe.experts.5.w3",
362
+ "model.layers.8.block_sparse_moe.experts.6.w1",
363
+ "model.layers.8.block_sparse_moe.experts.6.w2",
364
+ "model.layers.8.block_sparse_moe.experts.6.w3",
365
+ "model.layers.8.block_sparse_moe.experts.7.w1",
366
+ "model.layers.8.block_sparse_moe.experts.7.w2",
367
+ "model.layers.8.block_sparse_moe.experts.7.w3",
368
+ "model.layers.9.self_attn.q_proj",
369
+ "model.layers.9.self_attn.k_proj",
370
+ "model.layers.9.self_attn.v_proj",
371
+ "model.layers.9.self_attn.o_proj",
372
+ "model.layers.9.block_sparse_moe.gate",
373
+ "model.layers.9.block_sparse_moe.experts.0.w1",
374
+ "model.layers.9.block_sparse_moe.experts.0.w2",
375
+ "model.layers.9.block_sparse_moe.experts.0.w3",
376
+ "model.layers.9.block_sparse_moe.experts.1.w1",
377
+ "model.layers.9.block_sparse_moe.experts.1.w2",
378
+ "model.layers.9.block_sparse_moe.experts.1.w3",
379
+ "model.layers.9.block_sparse_moe.experts.2.w1",
380
+ "model.layers.9.block_sparse_moe.experts.2.w2",
381
+ "model.layers.9.block_sparse_moe.experts.2.w3",
382
+ "model.layers.9.block_sparse_moe.experts.3.w1",
383
+ "model.layers.9.block_sparse_moe.experts.3.w2",
384
+ "model.layers.9.block_sparse_moe.experts.3.w3",
385
+ "model.layers.9.block_sparse_moe.experts.4.w1",
386
+ "model.layers.9.block_sparse_moe.experts.4.w2",
387
+ "model.layers.9.block_sparse_moe.experts.4.w3",
388
+ "model.layers.9.block_sparse_moe.experts.5.w1",
389
+ "model.layers.9.block_sparse_moe.experts.5.w2",
390
+ "model.layers.9.block_sparse_moe.experts.5.w3",
391
+ "model.layers.9.block_sparse_moe.experts.6.w1",
392
+ "model.layers.9.block_sparse_moe.experts.6.w2",
393
+ "model.layers.9.block_sparse_moe.experts.6.w3",
394
+ "model.layers.9.block_sparse_moe.experts.7.w1",
395
+ "model.layers.9.block_sparse_moe.experts.7.w2",
396
+ "model.layers.9.block_sparse_moe.experts.7.w3",
397
+ "model.layers.10.self_attn.q_proj",
398
+ "model.layers.10.self_attn.k_proj",
399
+ "model.layers.10.self_attn.v_proj",
400
+ "model.layers.10.self_attn.o_proj",
401
+ "model.layers.10.block_sparse_moe.gate",
402
+ "model.layers.10.block_sparse_moe.experts.0.w1",
403
+ "model.layers.10.block_sparse_moe.experts.0.w2",
404
+ "model.layers.10.block_sparse_moe.experts.0.w3",
405
+ "model.layers.10.block_sparse_moe.experts.1.w1",
406
+ "model.layers.10.block_sparse_moe.experts.1.w2",
407
+ "model.layers.10.block_sparse_moe.experts.1.w3",
408
+ "model.layers.10.block_sparse_moe.experts.2.w1",
409
+ "model.layers.10.block_sparse_moe.experts.2.w2",
410
+ "model.layers.10.block_sparse_moe.experts.2.w3",
411
+ "model.layers.10.block_sparse_moe.experts.3.w1",
412
+ "model.layers.10.block_sparse_moe.experts.3.w2",
413
+ "model.layers.10.block_sparse_moe.experts.3.w3",
414
+ "model.layers.10.block_sparse_moe.experts.4.w1",
415
+ "model.layers.10.block_sparse_moe.experts.4.w2",
416
+ "model.layers.10.block_sparse_moe.experts.4.w3",
417
+ "model.layers.10.block_sparse_moe.experts.5.w1",
418
+ "model.layers.10.block_sparse_moe.experts.5.w2",
419
+ "model.layers.10.block_sparse_moe.experts.5.w3",
420
+ "model.layers.10.block_sparse_moe.experts.6.w1",
421
+ "model.layers.10.block_sparse_moe.experts.6.w2",
422
+ "model.layers.10.block_sparse_moe.experts.6.w3",
423
+ "model.layers.10.block_sparse_moe.experts.7.w1",
424
+ "model.layers.10.block_sparse_moe.experts.7.w2",
425
+ "model.layers.10.block_sparse_moe.experts.7.w3",
426
+ "model.layers.11.self_attn.q_proj",
427
+ "model.layers.11.self_attn.k_proj",
428
+ "model.layers.11.self_attn.v_proj",
429
+ "model.layers.11.self_attn.o_proj",
430
+ "model.layers.11.block_sparse_moe.gate",
431
+ "model.layers.11.block_sparse_moe.experts.0.w1",
432
+ "model.layers.11.block_sparse_moe.experts.0.w2",
433
+ "model.layers.11.block_sparse_moe.experts.0.w3",
434
+ "model.layers.11.block_sparse_moe.experts.1.w1",
435
+ "model.layers.11.block_sparse_moe.experts.1.w2",
436
+ "model.layers.11.block_sparse_moe.experts.1.w3",
437
+ "model.layers.11.block_sparse_moe.experts.2.w1",
438
+ "model.layers.11.block_sparse_moe.experts.2.w2",
439
+ "model.layers.11.block_sparse_moe.experts.2.w3",
440
+ "model.layers.11.block_sparse_moe.experts.3.w1",
441
+ "model.layers.11.block_sparse_moe.experts.3.w2",
442
+ "model.layers.11.block_sparse_moe.experts.3.w3",
443
+ "model.layers.11.block_sparse_moe.experts.4.w1",
444
+ "model.layers.11.block_sparse_moe.experts.4.w2",
445
+ "model.layers.11.block_sparse_moe.experts.4.w3",
446
+ "model.layers.11.block_sparse_moe.experts.5.w1",
447
+ "model.layers.11.block_sparse_moe.experts.5.w2",
448
+ "model.layers.11.block_sparse_moe.experts.5.w3",
449
+ "model.layers.11.block_sparse_moe.experts.6.w1",
450
+ "model.layers.11.block_sparse_moe.experts.6.w2",
451
+ "model.layers.11.block_sparse_moe.experts.6.w3",
452
+ "model.layers.11.block_sparse_moe.experts.7.w1",
453
+ "model.layers.11.block_sparse_moe.experts.7.w2",
454
+ "model.layers.11.block_sparse_moe.experts.7.w3",
455
+ "model.layers.12.self_attn.q_proj",
456
+ "model.layers.12.self_attn.k_proj",
457
+ "model.layers.12.self_attn.v_proj",
458
+ "model.layers.12.self_attn.o_proj",
459
+ "model.layers.12.block_sparse_moe.gate",
460
+ "model.layers.12.block_sparse_moe.experts.0.w1",
461
+ "model.layers.12.block_sparse_moe.experts.0.w2",
462
+ "model.layers.12.block_sparse_moe.experts.0.w3",
463
+ "model.layers.12.block_sparse_moe.experts.1.w1",
464
+ "model.layers.12.block_sparse_moe.experts.1.w2",
465
+ "model.layers.12.block_sparse_moe.experts.1.w3",
466
+ "model.layers.12.block_sparse_moe.experts.2.w1",
467
+ "model.layers.12.block_sparse_moe.experts.2.w2",
468
+ "model.layers.12.block_sparse_moe.experts.2.w3",
469
+ "model.layers.12.block_sparse_moe.experts.3.w1",
470
+ "model.layers.12.block_sparse_moe.experts.3.w2",
471
+ "model.layers.12.block_sparse_moe.experts.3.w3",
472
+ "model.layers.12.block_sparse_moe.experts.4.w1",
473
+ "model.layers.12.block_sparse_moe.experts.4.w2",
474
+ "model.layers.12.block_sparse_moe.experts.4.w3",
475
+ "model.layers.12.block_sparse_moe.experts.5.w1",
476
+ "model.layers.12.block_sparse_moe.experts.5.w2",
477
+ "model.layers.12.block_sparse_moe.experts.5.w3",
478
+ "model.layers.12.block_sparse_moe.experts.6.w1",
479
+ "model.layers.12.block_sparse_moe.experts.6.w2",
480
+ "model.layers.12.block_sparse_moe.experts.6.w3",
481
+ "model.layers.12.block_sparse_moe.experts.7.w1",
482
+ "model.layers.12.block_sparse_moe.experts.7.w2",
483
+ "model.layers.12.block_sparse_moe.experts.7.w3",
484
+ "model.layers.13.self_attn.q_proj",
485
+ "model.layers.13.self_attn.k_proj",
486
+ "model.layers.13.self_attn.v_proj",
487
+ "model.layers.13.self_attn.o_proj",
488
+ "model.layers.13.block_sparse_moe.gate",
489
+ "model.layers.13.block_sparse_moe.experts.0.w1",
490
+ "model.layers.13.block_sparse_moe.experts.0.w2",
491
+ "model.layers.13.block_sparse_moe.experts.0.w3",
492
+ "model.layers.13.block_sparse_moe.experts.1.w1",
493
+ "model.layers.13.block_sparse_moe.experts.1.w2",
494
+ "model.layers.13.block_sparse_moe.experts.1.w3",
495
+ "model.layers.13.block_sparse_moe.experts.2.w1",
496
+ "model.layers.13.block_sparse_moe.experts.2.w2",
497
+ "model.layers.13.block_sparse_moe.experts.2.w3",
498
+ "model.layers.13.block_sparse_moe.experts.3.w1",
499
+ "model.layers.13.block_sparse_moe.experts.3.w2",
500
+ "model.layers.13.block_sparse_moe.experts.3.w3",
501
+ "model.layers.13.block_sparse_moe.experts.4.w1",
502
+ "model.layers.13.block_sparse_moe.experts.4.w2",
503
+ "model.layers.13.block_sparse_moe.experts.4.w3",
504
+ "model.layers.13.block_sparse_moe.experts.5.w1",
505
+ "model.layers.13.block_sparse_moe.experts.5.w2",
506
+ "model.layers.13.block_sparse_moe.experts.5.w3",
507
+ "model.layers.13.block_sparse_moe.experts.6.w1",
508
+ "model.layers.13.block_sparse_moe.experts.6.w2",
509
+ "model.layers.13.block_sparse_moe.experts.6.w3",
510
+ "model.layers.13.block_sparse_moe.experts.7.w1",
511
+ "model.layers.13.block_sparse_moe.experts.7.w2",
512
+ "model.layers.13.block_sparse_moe.experts.7.w3",
513
+ "model.layers.14.self_attn.q_proj",
514
+ "model.layers.14.self_attn.k_proj",
515
+ "model.layers.14.self_attn.v_proj",
516
+ "model.layers.14.self_attn.o_proj",
517
+ "model.layers.14.block_sparse_moe.gate",
518
+ "model.layers.14.block_sparse_moe.experts.0.w1",
519
+ "model.layers.14.block_sparse_moe.experts.0.w2",
520
+ "model.layers.14.block_sparse_moe.experts.0.w3",
521
+ "model.layers.14.block_sparse_moe.experts.1.w1",
522
+ "model.layers.14.block_sparse_moe.experts.1.w2",
523
+ "model.layers.14.block_sparse_moe.experts.1.w3",
524
+ "model.layers.14.block_sparse_moe.experts.2.w1",
525
+ "model.layers.14.block_sparse_moe.experts.2.w2",
526
+ "model.layers.14.block_sparse_moe.experts.2.w3",
527
+ "model.layers.14.block_sparse_moe.experts.3.w1",
528
+ "model.layers.14.block_sparse_moe.experts.3.w2",
529
+ "model.layers.14.block_sparse_moe.experts.3.w3",
530
+ "model.layers.14.block_sparse_moe.experts.4.w1",
531
+ "model.layers.14.block_sparse_moe.experts.4.w2",
532
+ "model.layers.14.block_sparse_moe.experts.4.w3",
533
+ "model.layers.14.block_sparse_moe.experts.5.w1",
534
+ "model.layers.14.block_sparse_moe.experts.5.w2",
535
+ "model.layers.14.block_sparse_moe.experts.5.w3",
536
+ "model.layers.14.block_sparse_moe.experts.6.w1",
537
+ "model.layers.14.block_sparse_moe.experts.6.w2",
538
+ "model.layers.14.block_sparse_moe.experts.6.w3",
539
+ "model.layers.14.block_sparse_moe.experts.7.w1",
540
+ "model.layers.14.block_sparse_moe.experts.7.w2",
541
+ "model.layers.14.block_sparse_moe.experts.7.w3",
542
+ "model.layers.15.self_attn.q_proj",
543
+ "model.layers.15.self_attn.k_proj",
544
+ "model.layers.15.self_attn.v_proj",
545
+ "model.layers.15.self_attn.o_proj",
546
+ "model.layers.15.block_sparse_moe.gate",
547
+ "model.layers.15.block_sparse_moe.experts.0.w1",
548
+ "model.layers.15.block_sparse_moe.experts.0.w2",
549
+ "model.layers.15.block_sparse_moe.experts.0.w3",
550
+ "model.layers.15.block_sparse_moe.experts.1.w1",
551
+ "model.layers.15.block_sparse_moe.experts.1.w2",
552
+ "model.layers.15.block_sparse_moe.experts.1.w3",
553
+ "model.layers.15.block_sparse_moe.experts.2.w1",
554
+ "model.layers.15.block_sparse_moe.experts.2.w2",
555
+ "model.layers.15.block_sparse_moe.experts.2.w3",
556
+ "model.layers.15.block_sparse_moe.experts.3.w1",
557
+ "model.layers.15.block_sparse_moe.experts.3.w2",
558
+ "model.layers.15.block_sparse_moe.experts.3.w3",
559
+ "model.layers.15.block_sparse_moe.experts.4.w1",
560
+ "model.layers.15.block_sparse_moe.experts.4.w2",
561
+ "model.layers.15.block_sparse_moe.experts.4.w3",
562
+ "model.layers.15.block_sparse_moe.experts.5.w1",
563
+ "model.layers.15.block_sparse_moe.experts.5.w2",
564
+ "model.layers.15.block_sparse_moe.experts.5.w3",
565
+ "model.layers.15.block_sparse_moe.experts.6.w1",
566
+ "model.layers.15.block_sparse_moe.experts.6.w2",
567
+ "model.layers.15.block_sparse_moe.experts.6.w3",
568
+ "model.layers.15.block_sparse_moe.experts.7.w1",
569
+ "model.layers.15.block_sparse_moe.experts.7.w2",
570
+ "model.layers.15.block_sparse_moe.experts.7.w3",
571
+ "model.layers.16.self_attn.q_proj",
572
+ "model.layers.16.self_attn.k_proj",
573
+ "model.layers.16.self_attn.v_proj",
574
+ "model.layers.16.self_attn.o_proj",
575
+ "model.layers.16.block_sparse_moe.gate",
576
+ "model.layers.16.block_sparse_moe.experts.0.w1",
577
+ "model.layers.16.block_sparse_moe.experts.0.w2",
578
+ "model.layers.16.block_sparse_moe.experts.0.w3",
579
+ "model.layers.16.block_sparse_moe.experts.1.w1",
580
+ "model.layers.16.block_sparse_moe.experts.1.w2",
581
+ "model.layers.16.block_sparse_moe.experts.1.w3",
582
+ "model.layers.16.block_sparse_moe.experts.2.w1",
583
+ "model.layers.16.block_sparse_moe.experts.2.w2",
584
+ "model.layers.16.block_sparse_moe.experts.2.w3",
585
+ "model.layers.16.block_sparse_moe.experts.3.w1",
586
+ "model.layers.16.block_sparse_moe.experts.3.w2",
587
+ "model.layers.16.block_sparse_moe.experts.3.w3",
588
+ "model.layers.16.block_sparse_moe.experts.4.w1",
589
+ "model.layers.16.block_sparse_moe.experts.4.w2",
590
+ "model.layers.16.block_sparse_moe.experts.4.w3",
591
+ "model.layers.16.block_sparse_moe.experts.5.w1",
592
+ "model.layers.16.block_sparse_moe.experts.5.w2",
593
+ "model.layers.16.block_sparse_moe.experts.5.w3",
594
+ "model.layers.16.block_sparse_moe.experts.6.w1",
595
+ "model.layers.16.block_sparse_moe.experts.6.w2",
596
+ "model.layers.16.block_sparse_moe.experts.6.w3",
597
+ "model.layers.16.block_sparse_moe.experts.7.w1",
598
+ "model.layers.16.block_sparse_moe.experts.7.w2",
599
+ "model.layers.16.block_sparse_moe.experts.7.w3",
600
+ "model.layers.17.self_attn.q_proj",
601
+ "model.layers.17.self_attn.k_proj",
602
+ "model.layers.17.self_attn.v_proj",
603
+ "model.layers.17.self_attn.o_proj",
604
+ "model.layers.17.block_sparse_moe.gate",
605
+ "model.layers.17.block_sparse_moe.experts.0.w1",
606
+ "model.layers.17.block_sparse_moe.experts.0.w2",
607
+ "model.layers.17.block_sparse_moe.experts.0.w3",
608
+ "model.layers.17.block_sparse_moe.experts.1.w1",
609
+ "model.layers.17.block_sparse_moe.experts.1.w2",
610
+ "model.layers.17.block_sparse_moe.experts.1.w3",
611
+ "model.layers.17.block_sparse_moe.experts.2.w1",
612
+ "model.layers.17.block_sparse_moe.experts.2.w2",
613
+ "model.layers.17.block_sparse_moe.experts.2.w3",
614
+ "model.layers.17.block_sparse_moe.experts.3.w1",
615
+ "model.layers.17.block_sparse_moe.experts.3.w2",
616
+ "model.layers.17.block_sparse_moe.experts.3.w3",
617
+ "model.layers.17.block_sparse_moe.experts.4.w1",
618
+ "model.layers.17.block_sparse_moe.experts.4.w2",
619
+ "model.layers.17.block_sparse_moe.experts.4.w3",
620
+ "model.layers.17.block_sparse_moe.experts.5.w1",
621
+ "model.layers.17.block_sparse_moe.experts.5.w2",
622
+ "model.layers.17.block_sparse_moe.experts.5.w3",
623
+ "model.layers.17.block_sparse_moe.experts.6.w1",
624
+ "model.layers.17.block_sparse_moe.experts.6.w2",
625
+ "model.layers.17.block_sparse_moe.experts.6.w3",
626
+ "model.layers.17.block_sparse_moe.experts.7.w1",
627
+ "model.layers.17.block_sparse_moe.experts.7.w2",
628
+ "model.layers.17.block_sparse_moe.experts.7.w3",
629
+ "model.layers.18.self_attn.q_proj",
630
+ "model.layers.18.self_attn.k_proj",
631
+ "model.layers.18.self_attn.v_proj",
632
+ "model.layers.18.self_attn.o_proj",
633
+ "model.layers.18.block_sparse_moe.gate",
634
+ "model.layers.18.block_sparse_moe.experts.0.w1",
635
+ "model.layers.18.block_sparse_moe.experts.0.w2",
636
+ "model.layers.18.block_sparse_moe.experts.0.w3",
637
+ "model.layers.18.block_sparse_moe.experts.1.w1",
638
+ "model.layers.18.block_sparse_moe.experts.1.w2",
639
+ "model.layers.18.block_sparse_moe.experts.1.w3",
640
+ "model.layers.18.block_sparse_moe.experts.2.w1",
641
+ "model.layers.18.block_sparse_moe.experts.2.w2",
642
+ "model.layers.18.block_sparse_moe.experts.2.w3",
643
+ "model.layers.18.block_sparse_moe.experts.3.w1",
644
+ "model.layers.18.block_sparse_moe.experts.3.w2",
645
+ "model.layers.18.block_sparse_moe.experts.3.w3",
646
+ "model.layers.18.block_sparse_moe.experts.4.w1",
647
+ "model.layers.18.block_sparse_moe.experts.4.w2",
648
+ "model.layers.18.block_sparse_moe.experts.4.w3",
649
+ "model.layers.18.block_sparse_moe.experts.5.w1",
650
+ "model.layers.18.block_sparse_moe.experts.5.w2",
651
+ "model.layers.18.block_sparse_moe.experts.5.w3",
652
+ "model.layers.18.block_sparse_moe.experts.6.w1",
653
+ "model.layers.18.block_sparse_moe.experts.6.w2",
654
+ "model.layers.18.block_sparse_moe.experts.6.w3",
655
+ "model.layers.18.block_sparse_moe.experts.7.w1",
656
+ "model.layers.18.block_sparse_moe.experts.7.w2",
657
+ "model.layers.18.block_sparse_moe.experts.7.w3",
658
+ "model.layers.19.self_attn.q_proj",
659
+ "model.layers.19.self_attn.k_proj",
660
+ "model.layers.19.self_attn.v_proj",
661
+ "model.layers.19.self_attn.o_proj",
662
+ "model.layers.19.block_sparse_moe.gate",
663
+ "model.layers.19.block_sparse_moe.experts.0.w1",
664
+ "model.layers.19.block_sparse_moe.experts.0.w2",
665
+ "model.layers.19.block_sparse_moe.experts.0.w3",
666
+ "model.layers.19.block_sparse_moe.experts.1.w1",
667
+ "model.layers.19.block_sparse_moe.experts.1.w2",
668
+ "model.layers.19.block_sparse_moe.experts.1.w3",
669
+ "model.layers.19.block_sparse_moe.experts.2.w1",
670
+ "model.layers.19.block_sparse_moe.experts.2.w2",
671
+ "model.layers.19.block_sparse_moe.experts.2.w3",
672
+ "model.layers.19.block_sparse_moe.experts.3.w1",
673
+ "model.layers.19.block_sparse_moe.experts.3.w2",
674
+ "model.layers.19.block_sparse_moe.experts.3.w3",
675
+ "model.layers.19.block_sparse_moe.experts.4.w1",
676
+ "model.layers.19.block_sparse_moe.experts.4.w2",
677
+ "model.layers.19.block_sparse_moe.experts.4.w3",
678
+ "model.layers.19.block_sparse_moe.experts.5.w1",
679
+ "model.layers.19.block_sparse_moe.experts.5.w2",
680
+ "model.layers.19.block_sparse_moe.experts.5.w3",
681
+ "model.layers.19.block_sparse_moe.experts.6.w1",
682
+ "model.layers.19.block_sparse_moe.experts.6.w2",
683
+ "model.layers.19.block_sparse_moe.experts.6.w3",
684
+ "model.layers.19.block_sparse_moe.experts.7.w1",
685
+ "model.layers.19.block_sparse_moe.experts.7.w2",
686
+ "model.layers.19.block_sparse_moe.experts.7.w3",
687
+ "model.layers.20.self_attn.q_proj",
688
+ "model.layers.20.self_attn.k_proj",
689
+ "model.layers.20.self_attn.v_proj",
690
+ "model.layers.20.self_attn.o_proj",
691
+ "model.layers.20.block_sparse_moe.gate",
692
+ "model.layers.20.block_sparse_moe.experts.0.w1",
693
+ "model.layers.20.block_sparse_moe.experts.0.w2",
694
+ "model.layers.20.block_sparse_moe.experts.0.w3",
695
+ "model.layers.20.block_sparse_moe.experts.1.w1",
696
+ "model.layers.20.block_sparse_moe.experts.1.w2",
697
+ "model.layers.20.block_sparse_moe.experts.1.w3",
698
+ "model.layers.20.block_sparse_moe.experts.2.w1",
699
+ "model.layers.20.block_sparse_moe.experts.2.w2",
700
+ "model.layers.20.block_sparse_moe.experts.2.w3",
701
+ "model.layers.20.block_sparse_moe.experts.3.w1",
702
+ "model.layers.20.block_sparse_moe.experts.3.w2",
703
+ "model.layers.20.block_sparse_moe.experts.3.w3",
704
+ "model.layers.20.block_sparse_moe.experts.4.w1",
705
+ "model.layers.20.block_sparse_moe.experts.4.w2",
706
+ "model.layers.20.block_sparse_moe.experts.4.w3",
707
+ "model.layers.20.block_sparse_moe.experts.5.w1",
708
+ "model.layers.20.block_sparse_moe.experts.5.w2",
709
+ "model.layers.20.block_sparse_moe.experts.5.w3",
710
+ "model.layers.20.block_sparse_moe.experts.6.w1",
711
+ "model.layers.20.block_sparse_moe.experts.6.w2",
712
+ "model.layers.20.block_sparse_moe.experts.6.w3",
713
+ "model.layers.20.block_sparse_moe.experts.7.w1",
714
+ "model.layers.20.block_sparse_moe.experts.7.w2",
715
+ "model.layers.20.block_sparse_moe.experts.7.w3",
716
+ "model.layers.21.self_attn.q_proj",
717
+ "model.layers.21.self_attn.k_proj",
718
+ "model.layers.21.self_attn.v_proj",
719
+ "model.layers.21.self_attn.o_proj",
720
+ "model.layers.21.block_sparse_moe.gate",
721
+ "model.layers.21.block_sparse_moe.experts.0.w1",
722
+ "model.layers.21.block_sparse_moe.experts.0.w2",
723
+ "model.layers.21.block_sparse_moe.experts.0.w3",
724
+ "model.layers.21.block_sparse_moe.experts.1.w1",
725
+ "model.layers.21.block_sparse_moe.experts.1.w2",
726
+ "model.layers.21.block_sparse_moe.experts.1.w3",
727
+ "model.layers.21.block_sparse_moe.experts.2.w1",
728
+ "model.layers.21.block_sparse_moe.experts.2.w2",
729
+ "model.layers.21.block_sparse_moe.experts.2.w3",
730
+ "model.layers.21.block_sparse_moe.experts.3.w1",
731
+ "model.layers.21.block_sparse_moe.experts.3.w2",
732
+ "model.layers.21.block_sparse_moe.experts.3.w3",
733
+ "model.layers.21.block_sparse_moe.experts.4.w1",
734
+ "model.layers.21.block_sparse_moe.experts.4.w2",
735
+ "model.layers.21.block_sparse_moe.experts.4.w3",
736
+ "model.layers.21.block_sparse_moe.experts.5.w1",
737
+ "model.layers.21.block_sparse_moe.experts.5.w2",
738
+ "model.layers.21.block_sparse_moe.experts.5.w3",
739
+ "model.layers.21.block_sparse_moe.experts.6.w1",
740
+ "model.layers.21.block_sparse_moe.experts.6.w2",
741
+ "model.layers.21.block_sparse_moe.experts.6.w3",
742
+ "model.layers.21.block_sparse_moe.experts.7.w1",
743
+ "model.layers.21.block_sparse_moe.experts.7.w2",
744
+ "model.layers.21.block_sparse_moe.experts.7.w3",
745
+ "model.layers.22.self_attn.q_proj",
746
+ "model.layers.22.self_attn.k_proj",
747
+ "model.layers.22.self_attn.v_proj",
748
+ "model.layers.22.self_attn.o_proj",
749
+ "model.layers.22.block_sparse_moe.gate",
750
+ "model.layers.22.block_sparse_moe.experts.0.w1",
751
+ "model.layers.22.block_sparse_moe.experts.0.w2",
752
+ "model.layers.22.block_sparse_moe.experts.0.w3",
753
+ "model.layers.22.block_sparse_moe.experts.1.w1",
754
+ "model.layers.22.block_sparse_moe.experts.1.w2",
755
+ "model.layers.22.block_sparse_moe.experts.1.w3",
756
+ "model.layers.22.block_sparse_moe.experts.2.w1",
757
+ "model.layers.22.block_sparse_moe.experts.2.w2",
758
+ "model.layers.22.block_sparse_moe.experts.2.w3",
759
+ "model.layers.22.block_sparse_moe.experts.3.w1",
760
+ "model.layers.22.block_sparse_moe.experts.3.w2",
761
+ "model.layers.22.block_sparse_moe.experts.3.w3",
762
+ "model.layers.22.block_sparse_moe.experts.4.w1",
763
+ "model.layers.22.block_sparse_moe.experts.4.w2",
764
+ "model.layers.22.block_sparse_moe.experts.4.w3",
765
+ "model.layers.22.block_sparse_moe.experts.5.w1",
766
+ "model.layers.22.block_sparse_moe.experts.5.w2",
767
+ "model.layers.22.block_sparse_moe.experts.5.w3",
768
+ "model.layers.22.block_sparse_moe.experts.6.w1",
769
+ "model.layers.22.block_sparse_moe.experts.6.w2",
770
+ "model.layers.22.block_sparse_moe.experts.6.w3",
771
+ "model.layers.22.block_sparse_moe.experts.7.w1",
772
+ "model.layers.22.block_sparse_moe.experts.7.w2",
773
+ "model.layers.22.block_sparse_moe.experts.7.w3",
774
+ "model.layers.23.self_attn.q_proj",
775
+ "model.layers.23.self_attn.k_proj",
776
+ "model.layers.23.self_attn.v_proj",
777
+ "model.layers.23.self_attn.o_proj",
778
+ "model.layers.23.block_sparse_moe.gate",
779
+ "model.layers.23.block_sparse_moe.experts.0.w1",
780
+ "model.layers.23.block_sparse_moe.experts.0.w2",
781
+ "model.layers.23.block_sparse_moe.experts.0.w3",
782
+ "model.layers.23.block_sparse_moe.experts.1.w1",
783
+ "model.layers.23.block_sparse_moe.experts.1.w2",
784
+ "model.layers.23.block_sparse_moe.experts.1.w3",
785
+ "model.layers.23.block_sparse_moe.experts.2.w1",
786
+ "model.layers.23.block_sparse_moe.experts.2.w2",
787
+ "model.layers.23.block_sparse_moe.experts.2.w3",
788
+ "model.layers.23.block_sparse_moe.experts.3.w1",
789
+ "model.layers.23.block_sparse_moe.experts.3.w2",
790
+ "model.layers.23.block_sparse_moe.experts.3.w3",
791
+ "model.layers.23.block_sparse_moe.experts.4.w1",
792
+ "model.layers.23.block_sparse_moe.experts.4.w2",
793
+ "model.layers.23.block_sparse_moe.experts.4.w3",
794
+ "model.layers.23.block_sparse_moe.experts.5.w1",
795
+ "model.layers.23.block_sparse_moe.experts.5.w2",
796
+ "model.layers.23.block_sparse_moe.experts.5.w3",
797
+ "model.layers.23.block_sparse_moe.experts.6.w1",
798
+ "model.layers.23.block_sparse_moe.experts.6.w2",
799
+ "model.layers.23.block_sparse_moe.experts.6.w3",
800
+ "model.layers.23.block_sparse_moe.experts.7.w1",
801
+ "model.layers.23.block_sparse_moe.experts.7.w2",
802
+ "model.layers.23.block_sparse_moe.experts.7.w3",
803
+ "model.layers.24.self_attn.q_proj",
804
+ "model.layers.24.self_attn.k_proj",
805
+ "model.layers.24.self_attn.v_proj",
806
+ "model.layers.24.self_attn.o_proj",
807
+ "model.layers.24.block_sparse_moe.gate",
808
+ "model.layers.24.block_sparse_moe.experts.0.w1",
809
+ "model.layers.24.block_sparse_moe.experts.0.w2",
810
+ "model.layers.24.block_sparse_moe.experts.0.w3",
811
+ "model.layers.24.block_sparse_moe.experts.1.w1",
812
+ "model.layers.24.block_sparse_moe.experts.1.w2",
813
+ "model.layers.24.block_sparse_moe.experts.1.w3",
814
+ "model.layers.24.block_sparse_moe.experts.2.w1",
815
+ "model.layers.24.block_sparse_moe.experts.2.w2",
816
+ "model.layers.24.block_sparse_moe.experts.2.w3",
817
+ "model.layers.24.block_sparse_moe.experts.3.w1",
818
+ "model.layers.24.block_sparse_moe.experts.3.w2",
819
+ "model.layers.24.block_sparse_moe.experts.3.w3",
820
+ "model.layers.24.block_sparse_moe.experts.4.w1",
821
+ "model.layers.24.block_sparse_moe.experts.4.w2",
822
+ "model.layers.24.block_sparse_moe.experts.4.w3",
823
+ "model.layers.24.block_sparse_moe.experts.5.w1",
824
+ "model.layers.24.block_sparse_moe.experts.5.w2",
825
+ "model.layers.24.block_sparse_moe.experts.5.w3",
826
+ "model.layers.24.block_sparse_moe.experts.6.w1",
827
+ "model.layers.24.block_sparse_moe.experts.6.w2",
828
+ "model.layers.24.block_sparse_moe.experts.6.w3",
829
+ "model.layers.24.block_sparse_moe.experts.7.w1",
830
+ "model.layers.24.block_sparse_moe.experts.7.w2",
831
+ "model.layers.24.block_sparse_moe.experts.7.w3",
832
+ "model.layers.25.self_attn.q_proj",
833
+ "model.layers.25.self_attn.k_proj",
834
+ "model.layers.25.self_attn.v_proj",
835
+ "model.layers.25.self_attn.o_proj",
836
+ "model.layers.25.block_sparse_moe.gate",
837
+ "model.layers.25.block_sparse_moe.experts.0.w1",
838
+ "model.layers.25.block_sparse_moe.experts.0.w2",
839
+ "model.layers.25.block_sparse_moe.experts.0.w3",
840
+ "model.layers.25.block_sparse_moe.experts.1.w1",
841
+ "model.layers.25.block_sparse_moe.experts.1.w2",
842
+ "model.layers.25.block_sparse_moe.experts.1.w3",
843
+ "model.layers.25.block_sparse_moe.experts.2.w1",
844
+ "model.layers.25.block_sparse_moe.experts.2.w2",
845
+ "model.layers.25.block_sparse_moe.experts.2.w3",
846
+ "model.layers.25.block_sparse_moe.experts.3.w1",
847
+ "model.layers.25.block_sparse_moe.experts.3.w2",
848
+ "model.layers.25.block_sparse_moe.experts.3.w3",
849
+ "model.layers.25.block_sparse_moe.experts.4.w1",
850
+ "model.layers.25.block_sparse_moe.experts.4.w2",
851
+ "model.layers.25.block_sparse_moe.experts.4.w3",
852
+ "model.layers.25.block_sparse_moe.experts.5.w1",
853
+ "model.layers.25.block_sparse_moe.experts.5.w2",
854
+ "model.layers.25.block_sparse_moe.experts.5.w3",
855
+ "model.layers.25.block_sparse_moe.experts.6.w1",
856
+ "model.layers.25.block_sparse_moe.experts.6.w2",
857
+ "model.layers.25.block_sparse_moe.experts.6.w3",
858
+ "model.layers.25.block_sparse_moe.experts.7.w1",
859
+ "model.layers.25.block_sparse_moe.experts.7.w2",
860
+ "model.layers.25.block_sparse_moe.experts.7.w3",
861
+ "model.layers.26.self_attn.q_proj",
862
+ "model.layers.26.self_attn.k_proj",
863
+ "model.layers.26.self_attn.v_proj",
864
+ "model.layers.26.self_attn.o_proj",
865
+ "model.layers.26.block_sparse_moe.gate",
866
+ "model.layers.26.block_sparse_moe.experts.0.w1",
867
+ "model.layers.26.block_sparse_moe.experts.0.w2",
868
+ "model.layers.26.block_sparse_moe.experts.0.w3",
869
+ "model.layers.26.block_sparse_moe.experts.1.w1",
870
+ "model.layers.26.block_sparse_moe.experts.1.w2",
871
+ "model.layers.26.block_sparse_moe.experts.1.w3",
872
+ "model.layers.26.block_sparse_moe.experts.2.w1",
873
+ "model.layers.26.block_sparse_moe.experts.2.w2",
874
+ "model.layers.26.block_sparse_moe.experts.2.w3",
875
+ "model.layers.26.block_sparse_moe.experts.3.w1",
876
+ "model.layers.26.block_sparse_moe.experts.3.w2",
877
+ "model.layers.26.block_sparse_moe.experts.3.w3",
878
+ "model.layers.26.block_sparse_moe.experts.4.w1",
879
+ "model.layers.26.block_sparse_moe.experts.4.w2",
880
+ "model.layers.26.block_sparse_moe.experts.4.w3",
881
+ "model.layers.26.block_sparse_moe.experts.5.w1",
882
+ "model.layers.26.block_sparse_moe.experts.5.w2",
883
+ "model.layers.26.block_sparse_moe.experts.5.w3",
884
+ "model.layers.26.block_sparse_moe.experts.6.w1",
885
+ "model.layers.26.block_sparse_moe.experts.6.w2",
886
+ "model.layers.26.block_sparse_moe.experts.6.w3",
887
+ "model.layers.26.block_sparse_moe.experts.7.w1",
888
+ "model.layers.26.block_sparse_moe.experts.7.w2",
889
+ "model.layers.26.block_sparse_moe.experts.7.w3",
890
+ "model.layers.27.self_attn.q_proj",
891
+ "model.layers.27.self_attn.k_proj",
892
+ "model.layers.27.self_attn.v_proj",
893
+ "model.layers.27.self_attn.o_proj",
894
+ "model.layers.27.block_sparse_moe.gate",
895
+ "model.layers.27.block_sparse_moe.experts.0.w1",
896
+ "model.layers.27.block_sparse_moe.experts.0.w2",
897
+ "model.layers.27.block_sparse_moe.experts.0.w3",
898
+ "model.layers.27.block_sparse_moe.experts.1.w1",
899
+ "model.layers.27.block_sparse_moe.experts.1.w2",
900
+ "model.layers.27.block_sparse_moe.experts.1.w3",
901
+ "model.layers.27.block_sparse_moe.experts.2.w1",
902
+ "model.layers.27.block_sparse_moe.experts.2.w2",
903
+ "model.layers.27.block_sparse_moe.experts.2.w3",
904
+ "model.layers.27.block_sparse_moe.experts.3.w1",
905
+ "model.layers.27.block_sparse_moe.experts.3.w2",
906
+ "model.layers.27.block_sparse_moe.experts.3.w3",
907
+ "model.layers.27.block_sparse_moe.experts.4.w1",
908
+ "model.layers.27.block_sparse_moe.experts.4.w2",
909
+ "model.layers.27.block_sparse_moe.experts.4.w3",
910
+ "model.layers.27.block_sparse_moe.experts.5.w1",
911
+ "model.layers.27.block_sparse_moe.experts.5.w2",
912
+ "model.layers.27.block_sparse_moe.experts.5.w3",
913
+ "model.layers.27.block_sparse_moe.experts.6.w1",
914
+ "model.layers.27.block_sparse_moe.experts.6.w2",
915
+ "model.layers.27.block_sparse_moe.experts.6.w3",
916
+ "model.layers.27.block_sparse_moe.experts.7.w1",
917
+ "model.layers.27.block_sparse_moe.experts.7.w2",
918
+ "model.layers.27.block_sparse_moe.experts.7.w3",
919
+ "model.layers.28.self_attn.q_proj",
920
+ "model.layers.28.self_attn.k_proj",
921
+ "model.layers.28.self_attn.v_proj",
922
+ "model.layers.28.self_attn.o_proj",
923
+ "model.layers.28.block_sparse_moe.gate",
924
+ "model.layers.28.block_sparse_moe.experts.0.w1",
925
+ "model.layers.28.block_sparse_moe.experts.0.w2",
926
+ "model.layers.28.block_sparse_moe.experts.0.w3",
927
+ "model.layers.28.block_sparse_moe.experts.1.w1",
928
+ "model.layers.28.block_sparse_moe.experts.1.w2",
929
+ "model.layers.28.block_sparse_moe.experts.1.w3",
930
+ "model.layers.28.block_sparse_moe.experts.2.w1",
931
+ "model.layers.28.block_sparse_moe.experts.2.w2",
932
+ "model.layers.28.block_sparse_moe.experts.2.w3",
933
+ "model.layers.28.block_sparse_moe.experts.3.w1",
934
+ "model.layers.28.block_sparse_moe.experts.3.w2",
935
+ "model.layers.28.block_sparse_moe.experts.3.w3",
936
+ "model.layers.28.block_sparse_moe.experts.4.w1",
937
+ "model.layers.28.block_sparse_moe.experts.4.w2",
938
+ "model.layers.28.block_sparse_moe.experts.4.w3",
939
+ "model.layers.28.block_sparse_moe.experts.5.w1",
940
+ "model.layers.28.block_sparse_moe.experts.5.w2",
941
+ "model.layers.28.block_sparse_moe.experts.5.w3",
942
+ "model.layers.28.block_sparse_moe.experts.6.w1",
943
+ "model.layers.28.block_sparse_moe.experts.6.w2",
944
+ "model.layers.28.block_sparse_moe.experts.6.w3",
945
+ "model.layers.28.block_sparse_moe.experts.7.w1",
946
+ "model.layers.28.block_sparse_moe.experts.7.w2",
947
+ "model.layers.28.block_sparse_moe.experts.7.w3",
948
+ "model.layers.29.self_attn.q_proj",
949
+ "model.layers.29.self_attn.k_proj",
950
+ "model.layers.29.self_attn.v_proj",
951
+ "model.layers.29.self_attn.o_proj",
952
+ "model.layers.29.block_sparse_moe.gate",
953
+ "model.layers.29.block_sparse_moe.experts.0.w1",
954
+ "model.layers.29.block_sparse_moe.experts.0.w2",
955
+ "model.layers.29.block_sparse_moe.experts.0.w3",
956
+ "model.layers.29.block_sparse_moe.experts.1.w1",
957
+ "model.layers.29.block_sparse_moe.experts.1.w2",
958
+ "model.layers.29.block_sparse_moe.experts.1.w3",
959
+ "model.layers.29.block_sparse_moe.experts.2.w1",
960
+ "model.layers.29.block_sparse_moe.experts.2.w2",
961
+ "model.layers.29.block_sparse_moe.experts.2.w3",
962
+ "model.layers.29.block_sparse_moe.experts.3.w1",
963
+ "model.layers.29.block_sparse_moe.experts.3.w2",
964
+ "model.layers.29.block_sparse_moe.experts.3.w3",
965
+ "model.layers.29.block_sparse_moe.experts.4.w1",
966
+ "model.layers.29.block_sparse_moe.experts.4.w2",
967
+ "model.layers.29.block_sparse_moe.experts.4.w3",
968
+ "model.layers.29.block_sparse_moe.experts.5.w1",
969
+ "model.layers.29.block_sparse_moe.experts.5.w2",
970
+ "model.layers.29.block_sparse_moe.experts.5.w3",
971
+ "model.layers.29.block_sparse_moe.experts.6.w1",
972
+ "model.layers.29.block_sparse_moe.experts.6.w2",
973
+ "model.layers.29.block_sparse_moe.experts.6.w3",
974
+ "model.layers.29.block_sparse_moe.experts.7.w1",
975
+ "model.layers.29.block_sparse_moe.experts.7.w2",
976
+ "model.layers.29.block_sparse_moe.experts.7.w3",
977
+ "model.layers.30.self_attn.q_proj",
978
+ "model.layers.30.self_attn.k_proj",
979
+ "model.layers.30.self_attn.v_proj",
980
+ "model.layers.30.self_attn.o_proj",
981
+ "model.layers.30.block_sparse_moe.gate",
982
+ "model.layers.30.block_sparse_moe.experts.0.w1",
983
+ "model.layers.30.block_sparse_moe.experts.0.w2",
984
+ "model.layers.30.block_sparse_moe.experts.0.w3",
985
+ "model.layers.30.block_sparse_moe.experts.1.w1",
986
+ "model.layers.30.block_sparse_moe.experts.1.w2",
987
+ "model.layers.30.block_sparse_moe.experts.1.w3",
988
+ "model.layers.30.block_sparse_moe.experts.2.w1",
989
+ "model.layers.30.block_sparse_moe.experts.2.w2",
990
+ "model.layers.30.block_sparse_moe.experts.2.w3",
991
+ "model.layers.30.block_sparse_moe.experts.3.w1",
992
+ "model.layers.30.block_sparse_moe.experts.3.w2",
993
+ "model.layers.30.block_sparse_moe.experts.3.w3",
994
+ "model.layers.30.block_sparse_moe.experts.4.w1",
995
+ "model.layers.30.block_sparse_moe.experts.4.w2",
996
+ "model.layers.30.block_sparse_moe.experts.4.w3",
997
+ "model.layers.30.block_sparse_moe.experts.5.w1",
998
+ "model.layers.30.block_sparse_moe.experts.5.w2",
999
+ "model.layers.30.block_sparse_moe.experts.5.w3",
1000
+ "model.layers.30.block_sparse_moe.experts.6.w1",
1001
+ "model.layers.30.block_sparse_moe.experts.6.w2",
1002
+ "model.layers.30.block_sparse_moe.experts.6.w3",
1003
+ "model.layers.30.block_sparse_moe.experts.7.w1",
1004
+ "model.layers.30.block_sparse_moe.experts.7.w2",
1005
+ "model.layers.30.block_sparse_moe.experts.7.w3",
1006
+ "model.layers.31.self_attn.q_proj",
1007
+ "model.layers.31.self_attn.k_proj",
1008
+ "model.layers.31.self_attn.v_proj",
1009
+ "model.layers.31.self_attn.o_proj",
1010
+ "model.layers.31.block_sparse_moe.gate",
1011
+ "model.layers.31.block_sparse_moe.experts.0.w1",
1012
+ "model.layers.31.block_sparse_moe.experts.0.w2",
1013
+ "model.layers.31.block_sparse_moe.experts.0.w3",
1014
+ "model.layers.31.block_sparse_moe.experts.1.w1",
1015
+ "model.layers.31.block_sparse_moe.experts.1.w2",
1016
+ "model.layers.31.block_sparse_moe.experts.1.w3",
1017
+ "model.layers.31.block_sparse_moe.experts.2.w1",
1018
+ "model.layers.31.block_sparse_moe.experts.2.w2",
1019
+ "model.layers.31.block_sparse_moe.experts.2.w3",
1020
+ "model.layers.31.block_sparse_moe.experts.3.w1",
1021
+ "model.layers.31.block_sparse_moe.experts.3.w2",
1022
+ "model.layers.31.block_sparse_moe.experts.3.w3",
1023
+ "model.layers.31.block_sparse_moe.experts.4.w1",
1024
+ "model.layers.31.block_sparse_moe.experts.4.w2",
1025
+ "model.layers.31.block_sparse_moe.experts.4.w3",
1026
+ "model.layers.31.block_sparse_moe.experts.5.w1",
1027
+ "model.layers.31.block_sparse_moe.experts.5.w2",
1028
+ "model.layers.31.block_sparse_moe.experts.5.w3",
1029
+ "model.layers.31.block_sparse_moe.experts.6.w1",
1030
+ "model.layers.31.block_sparse_moe.experts.6.w2",
1031
+ "model.layers.31.block_sparse_moe.experts.6.w3",
1032
+ "model.layers.31.block_sparse_moe.experts.7.w1",
1033
+ "model.layers.31.block_sparse_moe.experts.7.w2",
1034
+ "model.layers.31.block_sparse_moe.experts.7.w3",
1035
+ "model.layers.32.self_attn.q_proj",
1036
+ "model.layers.32.self_attn.k_proj",
1037
+ "model.layers.32.self_attn.v_proj",
1038
+ "model.layers.32.self_attn.o_proj",
1039
+ "model.layers.32.block_sparse_moe.gate",
1040
+ "model.layers.32.block_sparse_moe.experts.0.w1",
1041
+ "model.layers.32.block_sparse_moe.experts.0.w2",
1042
+ "model.layers.32.block_sparse_moe.experts.0.w3",
1043
+ "model.layers.32.block_sparse_moe.experts.1.w1",
1044
+ "model.layers.32.block_sparse_moe.experts.1.w2",
1045
+ "model.layers.32.block_sparse_moe.experts.1.w3",
1046
+ "model.layers.32.block_sparse_moe.experts.2.w1",
1047
+ "model.layers.32.block_sparse_moe.experts.2.w2",
1048
+ "model.layers.32.block_sparse_moe.experts.2.w3",
1049
+ "model.layers.32.block_sparse_moe.experts.3.w1",
1050
+ "model.layers.32.block_sparse_moe.experts.3.w2",
1051
+ "model.layers.32.block_sparse_moe.experts.3.w3",
1052
+ "model.layers.32.block_sparse_moe.experts.4.w1",
1053
+ "model.layers.32.block_sparse_moe.experts.4.w2",
1054
+ "model.layers.32.block_sparse_moe.experts.4.w3",
1055
+ "model.layers.32.block_sparse_moe.experts.5.w1",
1056
+ "model.layers.32.block_sparse_moe.experts.5.w2",
1057
+ "model.layers.32.block_sparse_moe.experts.5.w3",
1058
+ "model.layers.32.block_sparse_moe.experts.6.w1",
1059
+ "model.layers.32.block_sparse_moe.experts.6.w2",
1060
+ "model.layers.32.block_sparse_moe.experts.6.w3",
1061
+ "model.layers.32.block_sparse_moe.experts.7.w1",
1062
+ "model.layers.32.block_sparse_moe.experts.7.w2",
1063
+ "model.layers.32.block_sparse_moe.experts.7.w3",
1064
+ "model.layers.33.self_attn.q_proj",
1065
+ "model.layers.33.self_attn.k_proj",
1066
+ "model.layers.33.self_attn.v_proj",
1067
+ "model.layers.33.self_attn.o_proj",
1068
+ "model.layers.33.block_sparse_moe.gate",
1069
+ "model.layers.33.block_sparse_moe.experts.0.w1",
1070
+ "model.layers.33.block_sparse_moe.experts.0.w2",
1071
+ "model.layers.33.block_sparse_moe.experts.0.w3",
1072
+ "model.layers.33.block_sparse_moe.experts.1.w1",
1073
+ "model.layers.33.block_sparse_moe.experts.1.w2",
1074
+ "model.layers.33.block_sparse_moe.experts.1.w3",
1075
+ "model.layers.33.block_sparse_moe.experts.2.w1",
1076
+ "model.layers.33.block_sparse_moe.experts.2.w2",
1077
+ "model.layers.33.block_sparse_moe.experts.2.w3",
1078
+ "model.layers.33.block_sparse_moe.experts.3.w1",
1079
+ "model.layers.33.block_sparse_moe.experts.3.w2",
1080
+ "model.layers.33.block_sparse_moe.experts.3.w3",
1081
+ "model.layers.33.block_sparse_moe.experts.4.w1",
1082
+ "model.layers.33.block_sparse_moe.experts.4.w2",
1083
+ "model.layers.33.block_sparse_moe.experts.4.w3",
1084
+ "model.layers.33.block_sparse_moe.experts.5.w1",
1085
+ "model.layers.33.block_sparse_moe.experts.5.w2",
1086
+ "model.layers.33.block_sparse_moe.experts.5.w3",
1087
+ "model.layers.33.block_sparse_moe.experts.6.w1",
1088
+ "model.layers.33.block_sparse_moe.experts.6.w2",
1089
+ "model.layers.33.block_sparse_moe.experts.6.w3",
1090
+ "model.layers.33.block_sparse_moe.experts.7.w1",
1091
+ "model.layers.33.block_sparse_moe.experts.7.w2",
1092
+ "model.layers.33.block_sparse_moe.experts.7.w3",
1093
+ "model.layers.34.self_attn.q_proj",
1094
+ "model.layers.34.self_attn.k_proj",
1095
+ "model.layers.34.self_attn.v_proj",
1096
+ "model.layers.34.self_attn.o_proj",
1097
+ "model.layers.34.block_sparse_moe.gate",
1098
+ "model.layers.34.block_sparse_moe.experts.0.w1",
1099
+ "model.layers.34.block_sparse_moe.experts.0.w2",
1100
+ "model.layers.34.block_sparse_moe.experts.0.w3",
1101
+ "model.layers.34.block_sparse_moe.experts.1.w1",
1102
+ "model.layers.34.block_sparse_moe.experts.1.w2",
1103
+ "model.layers.34.block_sparse_moe.experts.1.w3",
1104
+ "model.layers.34.block_sparse_moe.experts.2.w1",
1105
+ "model.layers.34.block_sparse_moe.experts.2.w2",
1106
+ "model.layers.34.block_sparse_moe.experts.2.w3",
1107
+ "model.layers.34.block_sparse_moe.experts.3.w1",
1108
+ "model.layers.34.block_sparse_moe.experts.3.w2",
1109
+ "model.layers.34.block_sparse_moe.experts.3.w3",
1110
+ "model.layers.34.block_sparse_moe.experts.4.w1",
1111
+ "model.layers.34.block_sparse_moe.experts.4.w2",
1112
+ "model.layers.34.block_sparse_moe.experts.4.w3",
1113
+ "model.layers.34.block_sparse_moe.experts.5.w1",
1114
+ "model.layers.34.block_sparse_moe.experts.5.w2",
1115
+ "model.layers.34.block_sparse_moe.experts.5.w3",
1116
+ "model.layers.34.block_sparse_moe.experts.6.w1",
1117
+ "model.layers.34.block_sparse_moe.experts.6.w2",
1118
+ "model.layers.34.block_sparse_moe.experts.6.w3",
1119
+ "model.layers.34.block_sparse_moe.experts.7.w1",
1120
+ "model.layers.34.block_sparse_moe.experts.7.w2",
1121
+ "model.layers.34.block_sparse_moe.experts.7.w3",
1122
+ "model.layers.35.self_attn.q_proj",
1123
+ "model.layers.35.self_attn.k_proj",
1124
+ "model.layers.35.self_attn.v_proj",
1125
+ "model.layers.35.self_attn.o_proj",
1126
+ "model.layers.35.block_sparse_moe.gate",
1127
+ "model.layers.35.block_sparse_moe.experts.0.w1",
1128
+ "model.layers.35.block_sparse_moe.experts.0.w2",
1129
+ "model.layers.35.block_sparse_moe.experts.0.w3",
1130
+ "model.layers.35.block_sparse_moe.experts.1.w1",
1131
+ "model.layers.35.block_sparse_moe.experts.1.w2",
1132
+ "model.layers.35.block_sparse_moe.experts.1.w3",
1133
+ "model.layers.35.block_sparse_moe.experts.2.w1",
1134
+ "model.layers.35.block_sparse_moe.experts.2.w2",
1135
+ "model.layers.35.block_sparse_moe.experts.2.w3",
1136
+ "model.layers.35.block_sparse_moe.experts.3.w1",
1137
+ "model.layers.35.block_sparse_moe.experts.3.w2",
1138
+ "model.layers.35.block_sparse_moe.experts.3.w3",
1139
+ "model.layers.35.block_sparse_moe.experts.4.w1",
1140
+ "model.layers.35.block_sparse_moe.experts.4.w2",
1141
+ "model.layers.35.block_sparse_moe.experts.4.w3",
1142
+ "model.layers.35.block_sparse_moe.experts.5.w1",
1143
+ "model.layers.35.block_sparse_moe.experts.5.w2",
1144
+ "model.layers.35.block_sparse_moe.experts.5.w3",
1145
+ "model.layers.35.block_sparse_moe.experts.6.w1",
1146
+ "model.layers.35.block_sparse_moe.experts.6.w2",
1147
+ "model.layers.35.block_sparse_moe.experts.6.w3",
1148
+ "model.layers.35.block_sparse_moe.experts.7.w1",
1149
+ "model.layers.35.block_sparse_moe.experts.7.w2",
1150
+ "model.layers.35.block_sparse_moe.experts.7.w3",
1151
+ "model.layers.36.self_attn.q_proj",
1152
+ "model.layers.36.self_attn.k_proj",
1153
+ "model.layers.36.self_attn.v_proj",
1154
+ "model.layers.36.self_attn.o_proj",
1155
+ "model.layers.36.block_sparse_moe.gate",
1156
+ "model.layers.36.block_sparse_moe.experts.0.w1",
1157
+ "model.layers.36.block_sparse_moe.experts.0.w2",
1158
+ "model.layers.36.block_sparse_moe.experts.0.w3",
1159
+ "model.layers.36.block_sparse_moe.experts.1.w1",
1160
+ "model.layers.36.block_sparse_moe.experts.1.w2",
1161
+ "model.layers.36.block_sparse_moe.experts.1.w3",
1162
+ "model.layers.36.block_sparse_moe.experts.2.w1",
1163
+ "model.layers.36.block_sparse_moe.experts.2.w2",
1164
+ "model.layers.36.block_sparse_moe.experts.2.w3",
1165
+ "model.layers.36.block_sparse_moe.experts.3.w1",
1166
+ "model.layers.36.block_sparse_moe.experts.3.w2",
1167
+ "model.layers.36.block_sparse_moe.experts.3.w3",
1168
+ "model.layers.36.block_sparse_moe.experts.4.w1",
1169
+ "model.layers.36.block_sparse_moe.experts.4.w2",
1170
+ "model.layers.36.block_sparse_moe.experts.4.w3",
1171
+ "model.layers.36.block_sparse_moe.experts.5.w1",
1172
+ "model.layers.36.block_sparse_moe.experts.5.w2",
1173
+ "model.layers.36.block_sparse_moe.experts.5.w3",
1174
+ "model.layers.36.block_sparse_moe.experts.6.w1",
1175
+ "model.layers.36.block_sparse_moe.experts.6.w2",
1176
+ "model.layers.36.block_sparse_moe.experts.6.w3",
1177
+ "model.layers.36.block_sparse_moe.experts.7.w1",
1178
+ "model.layers.36.block_sparse_moe.experts.7.w2",
1179
+ "model.layers.36.block_sparse_moe.experts.7.w3",
1180
+ "model.layers.37.self_attn.q_proj",
1181
+ "model.layers.37.self_attn.k_proj",
1182
+ "model.layers.37.self_attn.v_proj",
1183
+ "model.layers.37.self_attn.o_proj",
1184
+ "model.layers.37.block_sparse_moe.gate",
1185
+ "model.layers.37.block_sparse_moe.experts.0.w1",
1186
+ "model.layers.37.block_sparse_moe.experts.0.w2",
1187
+ "model.layers.37.block_sparse_moe.experts.0.w3",
1188
+ "model.layers.37.block_sparse_moe.experts.1.w1",
1189
+ "model.layers.37.block_sparse_moe.experts.1.w2",
1190
+ "model.layers.37.block_sparse_moe.experts.1.w3",
1191
+ "model.layers.37.block_sparse_moe.experts.2.w1",
1192
+ "model.layers.37.block_sparse_moe.experts.2.w2",
1193
+ "model.layers.37.block_sparse_moe.experts.2.w3",
1194
+ "model.layers.37.block_sparse_moe.experts.3.w1",
1195
+ "model.layers.37.block_sparse_moe.experts.3.w2",
1196
+ "model.layers.37.block_sparse_moe.experts.3.w3",
1197
+ "model.layers.37.block_sparse_moe.experts.4.w1",
1198
+ "model.layers.37.block_sparse_moe.experts.4.w2",
1199
+ "model.layers.37.block_sparse_moe.experts.4.w3",
1200
+ "model.layers.37.block_sparse_moe.experts.5.w1",
1201
+ "model.layers.37.block_sparse_moe.experts.5.w2",
1202
+ "model.layers.37.block_sparse_moe.experts.5.w3",
1203
+ "model.layers.37.block_sparse_moe.experts.6.w1",
1204
+ "model.layers.37.block_sparse_moe.experts.6.w2",
1205
+ "model.layers.37.block_sparse_moe.experts.6.w3",
1206
+ "model.layers.37.block_sparse_moe.experts.7.w1",
1207
+ "model.layers.37.block_sparse_moe.experts.7.w2",
1208
+ "model.layers.37.block_sparse_moe.experts.7.w3",
1209
+ "model.layers.38.self_attn.q_proj",
1210
+ "model.layers.38.self_attn.k_proj",
1211
+ "model.layers.38.self_attn.v_proj",
1212
+ "model.layers.38.self_attn.o_proj",
1213
+ "model.layers.38.block_sparse_moe.gate",
1214
+ "model.layers.38.block_sparse_moe.experts.0.w1",
1215
+ "model.layers.38.block_sparse_moe.experts.0.w2",
1216
+ "model.layers.38.block_sparse_moe.experts.0.w3",
1217
+ "model.layers.38.block_sparse_moe.experts.1.w1",
1218
+ "model.layers.38.block_sparse_moe.experts.1.w2",
1219
+ "model.layers.38.block_sparse_moe.experts.1.w3",
1220
+ "model.layers.38.block_sparse_moe.experts.2.w1",
1221
+ "model.layers.38.block_sparse_moe.experts.2.w2",
1222
+ "model.layers.38.block_sparse_moe.experts.2.w3",
1223
+ "model.layers.38.block_sparse_moe.experts.3.w1",
1224
+ "model.layers.38.block_sparse_moe.experts.3.w2",
1225
+ "model.layers.38.block_sparse_moe.experts.3.w3",
1226
+ "model.layers.38.block_sparse_moe.experts.4.w1",
1227
+ "model.layers.38.block_sparse_moe.experts.4.w2",
1228
+ "model.layers.38.block_sparse_moe.experts.4.w3",
1229
+ "model.layers.38.block_sparse_moe.experts.5.w1",
1230
+ "model.layers.38.block_sparse_moe.experts.5.w2",
1231
+ "model.layers.38.block_sparse_moe.experts.5.w3",
1232
+ "model.layers.38.block_sparse_moe.experts.6.w1",
1233
+ "model.layers.38.block_sparse_moe.experts.6.w2",
1234
+ "model.layers.38.block_sparse_moe.experts.6.w3",
1235
+ "model.layers.38.block_sparse_moe.experts.7.w1",
1236
+ "model.layers.38.block_sparse_moe.experts.7.w2",
1237
+ "model.layers.38.block_sparse_moe.experts.7.w3",
1238
+ "model.layers.39.self_attn.q_proj",
1239
+ "model.layers.39.self_attn.k_proj",
1240
+ "model.layers.39.self_attn.v_proj",
1241
+ "model.layers.39.self_attn.o_proj",
1242
+ "model.layers.39.block_sparse_moe.gate",
1243
+ "model.layers.39.block_sparse_moe.experts.0.w1",
1244
+ "model.layers.39.block_sparse_moe.experts.0.w2",
1245
+ "model.layers.39.block_sparse_moe.experts.0.w3",
1246
+ "model.layers.39.block_sparse_moe.experts.1.w1",
1247
+ "model.layers.39.block_sparse_moe.experts.1.w2",
1248
+ "model.layers.39.block_sparse_moe.experts.1.w3",
1249
+ "model.layers.39.block_sparse_moe.experts.2.w1",
1250
+ "model.layers.39.block_sparse_moe.experts.2.w2",
1251
+ "model.layers.39.block_sparse_moe.experts.2.w3",
1252
+ "model.layers.39.block_sparse_moe.experts.3.w1",
1253
+ "model.layers.39.block_sparse_moe.experts.3.w2",
1254
+ "model.layers.39.block_sparse_moe.experts.3.w3",
1255
+ "model.layers.39.block_sparse_moe.experts.4.w1",
1256
+ "model.layers.39.block_sparse_moe.experts.4.w2",
1257
+ "model.layers.39.block_sparse_moe.experts.4.w3",
1258
+ "model.layers.39.block_sparse_moe.experts.5.w1",
1259
+ "model.layers.39.block_sparse_moe.experts.5.w2",
1260
+ "model.layers.39.block_sparse_moe.experts.5.w3",
1261
+ "model.layers.39.block_sparse_moe.experts.6.w1",
1262
+ "model.layers.39.block_sparse_moe.experts.6.w2",
1263
+ "model.layers.39.block_sparse_moe.experts.6.w3",
1264
+ "model.layers.39.block_sparse_moe.experts.7.w1",
1265
+ "model.layers.39.block_sparse_moe.experts.7.w2",
1266
+ "model.layers.39.block_sparse_moe.experts.7.w3",
1267
+ "model.layers.40.self_attn.q_proj",
1268
+ "model.layers.40.self_attn.k_proj",
1269
+ "model.layers.40.self_attn.v_proj",
1270
+ "model.layers.40.self_attn.o_proj",
1271
+ "model.layers.40.block_sparse_moe.gate",
1272
+ "model.layers.40.block_sparse_moe.experts.0.w1",
1273
+ "model.layers.40.block_sparse_moe.experts.0.w2",
1274
+ "model.layers.40.block_sparse_moe.experts.0.w3",
1275
+ "model.layers.40.block_sparse_moe.experts.1.w1",
1276
+ "model.layers.40.block_sparse_moe.experts.1.w2",
1277
+ "model.layers.40.block_sparse_moe.experts.1.w3",
1278
+ "model.layers.40.block_sparse_moe.experts.2.w1",
1279
+ "model.layers.40.block_sparse_moe.experts.2.w2",
1280
+ "model.layers.40.block_sparse_moe.experts.2.w3",
1281
+ "model.layers.40.block_sparse_moe.experts.3.w1",
1282
+ "model.layers.40.block_sparse_moe.experts.3.w2",
1283
+ "model.layers.40.block_sparse_moe.experts.3.w3",
1284
+ "model.layers.40.block_sparse_moe.experts.4.w1",
1285
+ "model.layers.40.block_sparse_moe.experts.4.w2",
1286
+ "model.layers.40.block_sparse_moe.experts.4.w3",
1287
+ "model.layers.40.block_sparse_moe.experts.5.w1",
1288
+ "model.layers.40.block_sparse_moe.experts.5.w2",
1289
+ "model.layers.40.block_sparse_moe.experts.5.w3",
1290
+ "model.layers.40.block_sparse_moe.experts.6.w1",
1291
+ "model.layers.40.block_sparse_moe.experts.6.w2",
1292
+ "model.layers.40.block_sparse_moe.experts.6.w3",
1293
+ "model.layers.40.block_sparse_moe.experts.7.w1",
1294
+ "model.layers.40.block_sparse_moe.experts.7.w2",
1295
+ "model.layers.40.block_sparse_moe.experts.7.w3",
1296
+ "model.layers.41.self_attn.q_proj",
1297
+ "model.layers.41.self_attn.k_proj",
1298
+ "model.layers.41.self_attn.v_proj",
1299
+ "model.layers.41.self_attn.o_proj",
1300
+ "model.layers.41.block_sparse_moe.gate",
1301
+ "model.layers.41.block_sparse_moe.experts.0.w1",
1302
+ "model.layers.41.block_sparse_moe.experts.0.w2",
1303
+ "model.layers.41.block_sparse_moe.experts.0.w3",
1304
+ "model.layers.41.block_sparse_moe.experts.1.w1",
1305
+ "model.layers.41.block_sparse_moe.experts.1.w2",
1306
+ "model.layers.41.block_sparse_moe.experts.1.w3",
1307
+ "model.layers.41.block_sparse_moe.experts.2.w1",
1308
+ "model.layers.41.block_sparse_moe.experts.2.w2",
1309
+ "model.layers.41.block_sparse_moe.experts.2.w3",
1310
+ "model.layers.41.block_sparse_moe.experts.3.w1",
1311
+ "model.layers.41.block_sparse_moe.experts.3.w2",
1312
+ "model.layers.41.block_sparse_moe.experts.3.w3",
1313
+ "model.layers.41.block_sparse_moe.experts.4.w1",
1314
+ "model.layers.41.block_sparse_moe.experts.4.w2",
1315
+ "model.layers.41.block_sparse_moe.experts.4.w3",
1316
+ "model.layers.41.block_sparse_moe.experts.5.w1",
1317
+ "model.layers.41.block_sparse_moe.experts.5.w2",
1318
+ "model.layers.41.block_sparse_moe.experts.5.w3",
1319
+ "model.layers.41.block_sparse_moe.experts.6.w1",
1320
+ "model.layers.41.block_sparse_moe.experts.6.w2",
1321
+ "model.layers.41.block_sparse_moe.experts.6.w3",
1322
+ "model.layers.41.block_sparse_moe.experts.7.w1",
1323
+ "model.layers.41.block_sparse_moe.experts.7.w2",
1324
+ "model.layers.41.block_sparse_moe.experts.7.w3",
1325
+ "model.layers.42.self_attn.q_proj",
1326
+ "model.layers.42.self_attn.k_proj",
1327
+ "model.layers.42.self_attn.v_proj",
1328
+ "model.layers.42.self_attn.o_proj",
1329
+ "model.layers.42.block_sparse_moe.gate",
1330
+ "model.layers.42.block_sparse_moe.experts.0.w1",
1331
+ "model.layers.42.block_sparse_moe.experts.0.w2",
1332
+ "model.layers.42.block_sparse_moe.experts.0.w3",
1333
+ "model.layers.42.block_sparse_moe.experts.1.w1",
1334
+ "model.layers.42.block_sparse_moe.experts.1.w2",
1335
+ "model.layers.42.block_sparse_moe.experts.1.w3",
1336
+ "model.layers.42.block_sparse_moe.experts.2.w1",
1337
+ "model.layers.42.block_sparse_moe.experts.2.w2",
1338
+ "model.layers.42.block_sparse_moe.experts.2.w3",
1339
+ "model.layers.42.block_sparse_moe.experts.3.w1",
1340
+ "model.layers.42.block_sparse_moe.experts.3.w2",
1341
+ "model.layers.42.block_sparse_moe.experts.3.w3",
1342
+ "model.layers.42.block_sparse_moe.experts.4.w1",
1343
+ "model.layers.42.block_sparse_moe.experts.4.w2",
1344
+ "model.layers.42.block_sparse_moe.experts.4.w3",
1345
+ "model.layers.42.block_sparse_moe.experts.5.w1",
1346
+ "model.layers.42.block_sparse_moe.experts.5.w2",
1347
+ "model.layers.42.block_sparse_moe.experts.5.w3",
1348
+ "model.layers.42.block_sparse_moe.experts.6.w1",
1349
+ "model.layers.42.block_sparse_moe.experts.6.w2",
1350
+ "model.layers.42.block_sparse_moe.experts.6.w3",
1351
+ "model.layers.42.block_sparse_moe.experts.7.w1",
1352
+ "model.layers.42.block_sparse_moe.experts.7.w2",
1353
+ "model.layers.42.block_sparse_moe.experts.7.w3",
1354
+ "model.layers.43.self_attn.q_proj",
1355
+ "model.layers.43.self_attn.k_proj",
1356
+ "model.layers.43.self_attn.v_proj",
1357
+ "model.layers.43.self_attn.o_proj",
1358
+ "model.layers.43.block_sparse_moe.gate",
1359
+ "model.layers.43.block_sparse_moe.experts.0.w1",
1360
+ "model.layers.43.block_sparse_moe.experts.0.w2",
1361
+ "model.layers.43.block_sparse_moe.experts.0.w3",
1362
+ "model.layers.43.block_sparse_moe.experts.1.w1",
1363
+ "model.layers.43.block_sparse_moe.experts.1.w2",
1364
+ "model.layers.43.block_sparse_moe.experts.1.w3",
1365
+ "model.layers.43.block_sparse_moe.experts.2.w1",
1366
+ "model.layers.43.block_sparse_moe.experts.2.w2",
1367
+ "model.layers.43.block_sparse_moe.experts.2.w3",
1368
+ "model.layers.43.block_sparse_moe.experts.3.w1",
1369
+ "model.layers.43.block_sparse_moe.experts.3.w2",
1370
+ "model.layers.43.block_sparse_moe.experts.3.w3",
1371
+ "model.layers.43.block_sparse_moe.experts.4.w1",
1372
+ "model.layers.43.block_sparse_moe.experts.4.w2",
1373
+ "model.layers.43.block_sparse_moe.experts.4.w3",
1374
+ "model.layers.43.block_sparse_moe.experts.5.w1",
1375
+ "model.layers.43.block_sparse_moe.experts.5.w2",
1376
+ "model.layers.43.block_sparse_moe.experts.5.w3",
1377
+ "model.layers.43.block_sparse_moe.experts.6.w1",
1378
+ "model.layers.43.block_sparse_moe.experts.6.w2",
1379
+ "model.layers.43.block_sparse_moe.experts.6.w3",
1380
+ "model.layers.43.block_sparse_moe.experts.7.w1",
1381
+ "model.layers.43.block_sparse_moe.experts.7.w2",
1382
+ "model.layers.43.block_sparse_moe.experts.7.w3",
1383
+ "model.layers.44.self_attn.q_proj",
1384
+ "model.layers.44.self_attn.k_proj",
1385
+ "model.layers.44.self_attn.v_proj",
1386
+ "model.layers.44.self_attn.o_proj",
1387
+ "model.layers.44.block_sparse_moe.gate",
1388
+ "model.layers.44.block_sparse_moe.experts.0.w1",
1389
+ "model.layers.44.block_sparse_moe.experts.0.w2",
1390
+ "model.layers.44.block_sparse_moe.experts.0.w3",
1391
+ "model.layers.44.block_sparse_moe.experts.1.w1",
1392
+ "model.layers.44.block_sparse_moe.experts.1.w2",
1393
+ "model.layers.44.block_sparse_moe.experts.1.w3",
1394
+ "model.layers.44.block_sparse_moe.experts.2.w1",
1395
+ "model.layers.44.block_sparse_moe.experts.2.w2",
1396
+ "model.layers.44.block_sparse_moe.experts.2.w3",
1397
+ "model.layers.44.block_sparse_moe.experts.3.w1",
1398
+ "model.layers.44.block_sparse_moe.experts.3.w2",
1399
+ "model.layers.44.block_sparse_moe.experts.3.w3",
1400
+ "model.layers.44.block_sparse_moe.experts.4.w1",
1401
+ "model.layers.44.block_sparse_moe.experts.4.w2",
1402
+ "model.layers.44.block_sparse_moe.experts.4.w3",
1403
+ "model.layers.44.block_sparse_moe.experts.5.w1",
1404
+ "model.layers.44.block_sparse_moe.experts.5.w2",
1405
+ "model.layers.44.block_sparse_moe.experts.5.w3",
1406
+ "model.layers.44.block_sparse_moe.experts.6.w1",
1407
+ "model.layers.44.block_sparse_moe.experts.6.w2",
1408
+ "model.layers.44.block_sparse_moe.experts.6.w3",
1409
+ "model.layers.44.block_sparse_moe.experts.7.w1",
1410
+ "model.layers.44.block_sparse_moe.experts.7.w2",
1411
+ "model.layers.44.block_sparse_moe.experts.7.w3",
1412
+ "model.layers.45.self_attn.q_proj",
1413
+ "model.layers.45.self_attn.k_proj",
1414
+ "model.layers.45.self_attn.v_proj",
1415
+ "model.layers.45.self_attn.o_proj",
1416
+ "model.layers.45.block_sparse_moe.gate",
1417
+ "model.layers.45.block_sparse_moe.experts.0.w1",
1418
+ "model.layers.45.block_sparse_moe.experts.0.w2",
1419
+ "model.layers.45.block_sparse_moe.experts.0.w3",
1420
+ "model.layers.45.block_sparse_moe.experts.1.w1",
1421
+ "model.layers.45.block_sparse_moe.experts.1.w2",
1422
+ "model.layers.45.block_sparse_moe.experts.1.w3",
1423
+ "model.layers.45.block_sparse_moe.experts.2.w1",
1424
+ "model.layers.45.block_sparse_moe.experts.2.w2",
1425
+ "model.layers.45.block_sparse_moe.experts.2.w3",
1426
+ "model.layers.45.block_sparse_moe.experts.3.w1",
1427
+ "model.layers.45.block_sparse_moe.experts.3.w2",
1428
+ "model.layers.45.block_sparse_moe.experts.3.w3",
1429
+ "model.layers.45.block_sparse_moe.experts.4.w1",
1430
+ "model.layers.45.block_sparse_moe.experts.4.w2",
1431
+ "model.layers.45.block_sparse_moe.experts.4.w3",
1432
+ "model.layers.45.block_sparse_moe.experts.5.w1",
1433
+ "model.layers.45.block_sparse_moe.experts.5.w2",
1434
+ "model.layers.45.block_sparse_moe.experts.5.w3",
1435
+ "model.layers.45.block_sparse_moe.experts.6.w1",
1436
+ "model.layers.45.block_sparse_moe.experts.6.w2",
1437
+ "model.layers.45.block_sparse_moe.experts.6.w3",
1438
+ "model.layers.45.block_sparse_moe.experts.7.w1",
1439
+ "model.layers.45.block_sparse_moe.experts.7.w2",
1440
+ "model.layers.45.block_sparse_moe.experts.7.w3",
1441
+ "model.layers.46.self_attn.q_proj",
1442
+ "model.layers.46.self_attn.k_proj",
1443
+ "model.layers.46.self_attn.v_proj",
1444
+ "model.layers.46.self_attn.o_proj",
1445
+ "model.layers.46.block_sparse_moe.gate",
1446
+ "model.layers.46.block_sparse_moe.experts.0.w1",
1447
+ "model.layers.46.block_sparse_moe.experts.0.w2",
1448
+ "model.layers.46.block_sparse_moe.experts.0.w3",
1449
+ "model.layers.46.block_sparse_moe.experts.1.w1",
1450
+ "model.layers.46.block_sparse_moe.experts.1.w2",
1451
+ "model.layers.46.block_sparse_moe.experts.1.w3",
1452
+ "model.layers.46.block_sparse_moe.experts.2.w1",
1453
+ "model.layers.46.block_sparse_moe.experts.2.w2",
1454
+ "model.layers.46.block_sparse_moe.experts.2.w3",
1455
+ "model.layers.46.block_sparse_moe.experts.3.w1",
1456
+ "model.layers.46.block_sparse_moe.experts.3.w2",
1457
+ "model.layers.46.block_sparse_moe.experts.3.w3",
1458
+ "model.layers.46.block_sparse_moe.experts.4.w1",
1459
+ "model.layers.46.block_sparse_moe.experts.4.w2",
1460
+ "model.layers.46.block_sparse_moe.experts.4.w3",
1461
+ "model.layers.46.block_sparse_moe.experts.5.w1",
1462
+ "model.layers.46.block_sparse_moe.experts.5.w2",
1463
+ "model.layers.46.block_sparse_moe.experts.5.w3",
1464
+ "model.layers.46.block_sparse_moe.experts.6.w1",
1465
+ "model.layers.46.block_sparse_moe.experts.6.w2",
1466
+ "model.layers.46.block_sparse_moe.experts.6.w3",
1467
+ "model.layers.46.block_sparse_moe.experts.7.w1",
1468
+ "model.layers.46.block_sparse_moe.experts.7.w2",
1469
+ "model.layers.46.block_sparse_moe.experts.7.w3",
1470
+ "model.layers.47.self_attn.q_proj",
1471
+ "model.layers.47.self_attn.k_proj",
1472
+ "model.layers.47.self_attn.v_proj",
1473
+ "model.layers.47.self_attn.o_proj",
1474
+ "model.layers.47.block_sparse_moe.gate",
1475
+ "model.layers.47.block_sparse_moe.experts.0.w1",
1476
+ "model.layers.47.block_sparse_moe.experts.0.w2",
1477
+ "model.layers.47.block_sparse_moe.experts.0.w3",
1478
+ "model.layers.47.block_sparse_moe.experts.1.w1",
1479
+ "model.layers.47.block_sparse_moe.experts.1.w2",
1480
+ "model.layers.47.block_sparse_moe.experts.1.w3",
1481
+ "model.layers.47.block_sparse_moe.experts.2.w1",
1482
+ "model.layers.47.block_sparse_moe.experts.2.w2",
1483
+ "model.layers.47.block_sparse_moe.experts.2.w3",
1484
+ "model.layers.47.block_sparse_moe.experts.3.w1",
1485
+ "model.layers.47.block_sparse_moe.experts.3.w2",
1486
+ "model.layers.47.block_sparse_moe.experts.3.w3",
1487
+ "model.layers.47.block_sparse_moe.experts.4.w1",
1488
+ "model.layers.47.block_sparse_moe.experts.4.w2",
1489
+ "model.layers.47.block_sparse_moe.experts.4.w3",
1490
+ "model.layers.47.block_sparse_moe.experts.5.w1",
1491
+ "model.layers.47.block_sparse_moe.experts.5.w2",
1492
+ "model.layers.47.block_sparse_moe.experts.5.w3",
1493
+ "model.layers.47.block_sparse_moe.experts.6.w1",
1494
+ "model.layers.47.block_sparse_moe.experts.6.w2",
1495
+ "model.layers.47.block_sparse_moe.experts.6.w3",
1496
+ "model.layers.47.block_sparse_moe.experts.7.w1",
1497
+ "model.layers.47.block_sparse_moe.experts.7.w2",
1498
+ "model.layers.47.block_sparse_moe.experts.7.w3",
1499
+ "model.layers.48.self_attn.q_proj",
1500
+ "model.layers.48.self_attn.k_proj",
1501
+ "model.layers.48.self_attn.v_proj",
1502
+ "model.layers.48.self_attn.o_proj",
1503
+ "model.layers.48.block_sparse_moe.gate",
1504
+ "model.layers.48.block_sparse_moe.experts.0.w1",
1505
+ "model.layers.48.block_sparse_moe.experts.0.w2",
1506
+ "model.layers.48.block_sparse_moe.experts.0.w3",
1507
+ "model.layers.48.block_sparse_moe.experts.1.w1",
1508
+ "model.layers.48.block_sparse_moe.experts.1.w2",
1509
+ "model.layers.48.block_sparse_moe.experts.1.w3",
1510
+ "model.layers.48.block_sparse_moe.experts.2.w1",
1511
+ "model.layers.48.block_sparse_moe.experts.2.w2",
1512
+ "model.layers.48.block_sparse_moe.experts.2.w3",
1513
+ "model.layers.48.block_sparse_moe.experts.3.w1",
1514
+ "model.layers.48.block_sparse_moe.experts.3.w2",
1515
+ "model.layers.48.block_sparse_moe.experts.3.w3",
1516
+ "model.layers.48.block_sparse_moe.experts.4.w1",
1517
+ "model.layers.48.block_sparse_moe.experts.4.w2",
1518
+ "model.layers.48.block_sparse_moe.experts.4.w3",
1519
+ "model.layers.48.block_sparse_moe.experts.5.w1",
1520
+ "model.layers.48.block_sparse_moe.experts.5.w2",
1521
+ "model.layers.48.block_sparse_moe.experts.5.w3",
1522
+ "model.layers.48.block_sparse_moe.experts.6.w1",
1523
+ "model.layers.48.block_sparse_moe.experts.6.w2",
1524
+ "model.layers.48.block_sparse_moe.experts.6.w3",
1525
+ "model.layers.48.block_sparse_moe.experts.7.w1",
1526
+ "model.layers.48.block_sparse_moe.experts.7.w2",
1527
+ "model.layers.48.block_sparse_moe.experts.7.w3",
1528
+ "model.layers.49.self_attn.q_proj",
1529
+ "model.layers.49.self_attn.k_proj",
1530
+ "model.layers.49.self_attn.v_proj",
1531
+ "model.layers.49.self_attn.o_proj",
1532
+ "model.layers.49.block_sparse_moe.gate",
1533
+ "model.layers.49.block_sparse_moe.experts.0.w1",
1534
+ "model.layers.49.block_sparse_moe.experts.0.w2",
1535
+ "model.layers.49.block_sparse_moe.experts.0.w3",
1536
+ "model.layers.49.block_sparse_moe.experts.1.w1",
1537
+ "model.layers.49.block_sparse_moe.experts.1.w2",
1538
+ "model.layers.49.block_sparse_moe.experts.1.w3",
1539
+ "model.layers.49.block_sparse_moe.experts.2.w1",
1540
+ "model.layers.49.block_sparse_moe.experts.2.w2",
1541
+ "model.layers.49.block_sparse_moe.experts.2.w3",
1542
+ "model.layers.49.block_sparse_moe.experts.3.w1",
1543
+ "model.layers.49.block_sparse_moe.experts.3.w2",
1544
+ "model.layers.49.block_sparse_moe.experts.3.w3",
1545
+ "model.layers.49.block_sparse_moe.experts.4.w1",
1546
+ "model.layers.49.block_sparse_moe.experts.4.w2",
1547
+ "model.layers.49.block_sparse_moe.experts.4.w3",
1548
+ "model.layers.49.block_sparse_moe.experts.5.w1",
1549
+ "model.layers.49.block_sparse_moe.experts.5.w2",
1550
+ "model.layers.49.block_sparse_moe.experts.5.w3",
1551
+ "model.layers.49.block_sparse_moe.experts.6.w1",
1552
+ "model.layers.49.block_sparse_moe.experts.6.w2",
1553
+ "model.layers.49.block_sparse_moe.experts.6.w3",
1554
+ "model.layers.49.block_sparse_moe.experts.7.w1",
1555
+ "model.layers.49.block_sparse_moe.experts.7.w2",
1556
+ "model.layers.49.block_sparse_moe.experts.7.w3",
1557
+ "model.layers.50.self_attn.q_proj",
1558
+ "model.layers.50.self_attn.k_proj",
1559
+ "model.layers.50.self_attn.v_proj",
1560
+ "model.layers.50.self_attn.o_proj",
1561
+ "model.layers.50.block_sparse_moe.gate",
1562
+ "model.layers.50.block_sparse_moe.experts.0.w1",
1563
+ "model.layers.50.block_sparse_moe.experts.0.w2",
1564
+ "model.layers.50.block_sparse_moe.experts.0.w3",
1565
+ "model.layers.50.block_sparse_moe.experts.1.w1",
1566
+ "model.layers.50.block_sparse_moe.experts.1.w2",
1567
+ "model.layers.50.block_sparse_moe.experts.1.w3",
1568
+ "model.layers.50.block_sparse_moe.experts.2.w1",
1569
+ "model.layers.50.block_sparse_moe.experts.2.w2",
1570
+ "model.layers.50.block_sparse_moe.experts.2.w3",
1571
+ "model.layers.50.block_sparse_moe.experts.3.w1",
1572
+ "model.layers.50.block_sparse_moe.experts.3.w2",
1573
+ "model.layers.50.block_sparse_moe.experts.3.w3",
1574
+ "model.layers.50.block_sparse_moe.experts.4.w1",
1575
+ "model.layers.50.block_sparse_moe.experts.4.w2",
1576
+ "model.layers.50.block_sparse_moe.experts.4.w3",
1577
+ "model.layers.50.block_sparse_moe.experts.5.w1",
1578
+ "model.layers.50.block_sparse_moe.experts.5.w2",
1579
+ "model.layers.50.block_sparse_moe.experts.5.w3",
1580
+ "model.layers.50.block_sparse_moe.experts.6.w1",
1581
+ "model.layers.50.block_sparse_moe.experts.6.w2",
1582
+ "model.layers.50.block_sparse_moe.experts.6.w3",
1583
+ "model.layers.50.block_sparse_moe.experts.7.w1",
1584
+ "model.layers.50.block_sparse_moe.experts.7.w2",
1585
+ "model.layers.50.block_sparse_moe.experts.7.w3",
1586
+ "model.layers.51.self_attn.q_proj",
1587
+ "model.layers.51.self_attn.k_proj",
1588
+ "model.layers.51.self_attn.v_proj",
1589
+ "model.layers.51.self_attn.o_proj",
1590
+ "model.layers.51.block_sparse_moe.gate",
1591
+ "model.layers.51.block_sparse_moe.experts.0.w1",
1592
+ "model.layers.51.block_sparse_moe.experts.0.w2",
1593
+ "model.layers.51.block_sparse_moe.experts.0.w3",
1594
+ "model.layers.51.block_sparse_moe.experts.1.w1",
1595
+ "model.layers.51.block_sparse_moe.experts.1.w2",
1596
+ "model.layers.51.block_sparse_moe.experts.1.w3",
1597
+ "model.layers.51.block_sparse_moe.experts.2.w1",
1598
+ "model.layers.51.block_sparse_moe.experts.2.w2",
1599
+ "model.layers.51.block_sparse_moe.experts.2.w3",
1600
+ "model.layers.51.block_sparse_moe.experts.3.w1",
1601
+ "model.layers.51.block_sparse_moe.experts.3.w2",
1602
+ "model.layers.51.block_sparse_moe.experts.3.w3",
1603
+ "model.layers.51.block_sparse_moe.experts.4.w1",
1604
+ "model.layers.51.block_sparse_moe.experts.4.w2",
1605
+ "model.layers.51.block_sparse_moe.experts.4.w3",
1606
+ "model.layers.51.block_sparse_moe.experts.5.w1",
1607
+ "model.layers.51.block_sparse_moe.experts.5.w2",
1608
+ "model.layers.51.block_sparse_moe.experts.5.w3",
1609
+ "model.layers.51.block_sparse_moe.experts.6.w1",
1610
+ "model.layers.51.block_sparse_moe.experts.6.w2",
1611
+ "model.layers.51.block_sparse_moe.experts.6.w3",
1612
+ "model.layers.51.block_sparse_moe.experts.7.w1",
1613
+ "model.layers.51.block_sparse_moe.experts.7.w2",
1614
+ "model.layers.51.block_sparse_moe.experts.7.w3",
1615
+ "model.layers.52.self_attn.q_proj",
1616
+ "model.layers.52.self_attn.k_proj",
1617
+ "model.layers.52.self_attn.v_proj",
1618
+ "model.layers.52.self_attn.o_proj",
1619
+ "model.layers.52.block_sparse_moe.gate",
1620
+ "model.layers.52.block_sparse_moe.experts.0.w1",
1621
+ "model.layers.52.block_sparse_moe.experts.0.w2",
1622
+ "model.layers.52.block_sparse_moe.experts.0.w3",
1623
+ "model.layers.52.block_sparse_moe.experts.1.w1",
1624
+ "model.layers.52.block_sparse_moe.experts.1.w2",
1625
+ "model.layers.52.block_sparse_moe.experts.1.w3",
1626
+ "model.layers.52.block_sparse_moe.experts.2.w1",
1627
+ "model.layers.52.block_sparse_moe.experts.2.w2",
1628
+ "model.layers.52.block_sparse_moe.experts.2.w3",
1629
+ "model.layers.52.block_sparse_moe.experts.3.w1",
1630
+ "model.layers.52.block_sparse_moe.experts.3.w2",
1631
+ "model.layers.52.block_sparse_moe.experts.3.w3",
1632
+ "model.layers.52.block_sparse_moe.experts.4.w1",
1633
+ "model.layers.52.block_sparse_moe.experts.4.w2",
1634
+ "model.layers.52.block_sparse_moe.experts.4.w3",
1635
+ "model.layers.52.block_sparse_moe.experts.5.w1",
1636
+ "model.layers.52.block_sparse_moe.experts.5.w2",
1637
+ "model.layers.52.block_sparse_moe.experts.5.w3",
1638
+ "model.layers.52.block_sparse_moe.experts.6.w1",
1639
+ "model.layers.52.block_sparse_moe.experts.6.w2",
1640
+ "model.layers.52.block_sparse_moe.experts.6.w3",
1641
+ "model.layers.52.block_sparse_moe.experts.7.w1",
1642
+ "model.layers.52.block_sparse_moe.experts.7.w2",
1643
+ "model.layers.52.block_sparse_moe.experts.7.w3",
1644
+ "model.layers.53.self_attn.q_proj",
1645
+ "model.layers.53.self_attn.k_proj",
1646
+ "model.layers.53.self_attn.v_proj",
1647
+ "model.layers.53.self_attn.o_proj",
1648
+ "model.layers.53.block_sparse_moe.gate",
1649
+ "model.layers.53.block_sparse_moe.experts.0.w1",
1650
+ "model.layers.53.block_sparse_moe.experts.0.w2",
1651
+ "model.layers.53.block_sparse_moe.experts.0.w3",
1652
+ "model.layers.53.block_sparse_moe.experts.1.w1",
1653
+ "model.layers.53.block_sparse_moe.experts.1.w2",
1654
+ "model.layers.53.block_sparse_moe.experts.1.w3",
1655
+ "model.layers.53.block_sparse_moe.experts.2.w1",
1656
+ "model.layers.53.block_sparse_moe.experts.2.w2",
1657
+ "model.layers.53.block_sparse_moe.experts.2.w3",
1658
+ "model.layers.53.block_sparse_moe.experts.3.w1",
1659
+ "model.layers.53.block_sparse_moe.experts.3.w2",
1660
+ "model.layers.53.block_sparse_moe.experts.3.w3",
1661
+ "model.layers.53.block_sparse_moe.experts.4.w1",
1662
+ "model.layers.53.block_sparse_moe.experts.4.w2",
1663
+ "model.layers.53.block_sparse_moe.experts.4.w3",
1664
+ "model.layers.53.block_sparse_moe.experts.5.w1",
1665
+ "model.layers.53.block_sparse_moe.experts.5.w2",
1666
+ "model.layers.53.block_sparse_moe.experts.5.w3",
1667
+ "model.layers.53.block_sparse_moe.experts.6.w1",
1668
+ "model.layers.53.block_sparse_moe.experts.6.w2",
1669
+ "model.layers.53.block_sparse_moe.experts.6.w3",
1670
+ "model.layers.53.block_sparse_moe.experts.7.w1",
1671
+ "model.layers.53.block_sparse_moe.experts.7.w2",
1672
+ "model.layers.53.block_sparse_moe.experts.7.w3",
1673
+ "model.layers.54.self_attn.q_proj",
1674
+ "model.layers.54.self_attn.k_proj",
1675
+ "model.layers.54.self_attn.v_proj",
1676
+ "model.layers.54.self_attn.o_proj",
1677
+ "model.layers.54.block_sparse_moe.gate",
1678
+ "model.layers.54.block_sparse_moe.experts.0.w1",
1679
+ "model.layers.54.block_sparse_moe.experts.0.w2",
1680
+ "model.layers.54.block_sparse_moe.experts.0.w3",
1681
+ "model.layers.54.block_sparse_moe.experts.1.w1",
1682
+ "model.layers.54.block_sparse_moe.experts.1.w2",
1683
+ "model.layers.54.block_sparse_moe.experts.1.w3",
1684
+ "model.layers.54.block_sparse_moe.experts.2.w1",
1685
+ "model.layers.54.block_sparse_moe.experts.2.w2",
1686
+ "model.layers.54.block_sparse_moe.experts.2.w3",
1687
+ "model.layers.54.block_sparse_moe.experts.3.w1",
1688
+ "model.layers.54.block_sparse_moe.experts.3.w2",
1689
+ "model.layers.54.block_sparse_moe.experts.3.w3",
1690
+ "model.layers.54.block_sparse_moe.experts.4.w1",
1691
+ "model.layers.54.block_sparse_moe.experts.4.w2",
1692
+ "model.layers.54.block_sparse_moe.experts.4.w3",
1693
+ "model.layers.54.block_sparse_moe.experts.5.w1",
1694
+ "model.layers.54.block_sparse_moe.experts.5.w2",
1695
+ "model.layers.54.block_sparse_moe.experts.5.w3",
1696
+ "model.layers.54.block_sparse_moe.experts.6.w1",
1697
+ "model.layers.54.block_sparse_moe.experts.6.w2",
1698
+ "model.layers.54.block_sparse_moe.experts.6.w3",
1699
+ "model.layers.54.block_sparse_moe.experts.7.w1",
1700
+ "model.layers.54.block_sparse_moe.experts.7.w2",
1701
+ "model.layers.54.block_sparse_moe.experts.7.w3",
1702
+ "model.layers.55.self_attn.q_proj",
1703
+ "model.layers.55.self_attn.k_proj",
1704
+ "model.layers.55.self_attn.v_proj",
1705
+ "model.layers.55.self_attn.o_proj",
1706
+ "model.layers.55.block_sparse_moe.gate",
1707
+ "model.layers.55.block_sparse_moe.experts.0.w1",
1708
+ "model.layers.55.block_sparse_moe.experts.0.w2",
1709
+ "model.layers.55.block_sparse_moe.experts.0.w3",
1710
+ "model.layers.55.block_sparse_moe.experts.1.w1",
1711
+ "model.layers.55.block_sparse_moe.experts.1.w2",
1712
+ "model.layers.55.block_sparse_moe.experts.1.w3",
1713
+ "model.layers.55.block_sparse_moe.experts.2.w1",
1714
+ "model.layers.55.block_sparse_moe.experts.2.w2",
1715
+ "model.layers.55.block_sparse_moe.experts.2.w3",
1716
+ "model.layers.55.block_sparse_moe.experts.3.w1",
1717
+ "model.layers.55.block_sparse_moe.experts.3.w2",
1718
+ "model.layers.55.block_sparse_moe.experts.3.w3",
1719
+ "model.layers.55.block_sparse_moe.experts.4.w1",
1720
+ "model.layers.55.block_sparse_moe.experts.4.w2",
1721
+ "model.layers.55.block_sparse_moe.experts.4.w3",
1722
+ "model.layers.55.block_sparse_moe.experts.5.w1",
1723
+ "model.layers.55.block_sparse_moe.experts.5.w2",
1724
+ "model.layers.55.block_sparse_moe.experts.5.w3",
1725
+ "model.layers.55.block_sparse_moe.experts.6.w1",
1726
+ "model.layers.55.block_sparse_moe.experts.6.w2",
1727
+ "model.layers.55.block_sparse_moe.experts.6.w3",
1728
+ "model.layers.55.block_sparse_moe.experts.7.w1",
1729
+ "model.layers.55.block_sparse_moe.experts.7.w2",
1730
+ "model.layers.55.block_sparse_moe.experts.7.w3",
1731
+ "lm_head"
1732
+ ],
1733
+ "registry_requires_subclass": false,
1734
+ "sparsity_structure": "unstructured",
1735
+ "targets": [
1736
+ "model.layers.0.self_attn.q_proj",
1737
+ "model.layers.0.self_attn.k_proj",
1738
+ "model.layers.0.self_attn.v_proj",
1739
+ "model.layers.1.self_attn.q_proj",
1740
+ "model.layers.1.self_attn.k_proj"
1741
+ ]
1742
+ }
1743
+ },
1744
+ "rms_norm_eps": 1e-05,
1745
+ "rope_theta": 1000000,
1746
+ "router_aux_loss_coef": 0.001,
1747
+ "router_jitter_noise": 0.0,
1748
+ "sliding_window": null,
1749
+ "tie_word_embeddings": false,
1750
+ "torch_dtype": "bfloat16",
1751
+ "transformers_version": "4.47.1",
1752
+ "use_cache": false,
1753
+ "vocab_size": 32000
1754
+ }
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "eos_token_id": 2,
5
+ "transformers_version": "4.47.1"
6
+ }
model-00001-of-00015.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d163d2389dca0d49aaa2112b051725c91aa59442a82639de074d8d4b3da66270
3
+ size 4951110552
model-00002-of-00015.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:010a7e3f4b74f65ccbaa3cecce13a72480b9639606dc4a5ef1caa8497b6450dc
3
+ size 4960776216
model-00003-of-00015.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fda97e68149fad9cb9b93580d03fdc4800faf5793ed9f5d997c74405aebc9d88
3
+ size 4960796840
model-00004-of-00015.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a7a497cd6b84e0ff4b34245c427bd0e0ce5fe9d52b164e8221b9b7b59c2312fe
3
+ size 4960776560
model-00005-of-00015.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fc686afeff0203e155644c4dc639ec01ac4be20a51c2c254c7310df1cf8ffc67
3
+ size 4960776560
model-00006-of-00015.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6cd9149aaac3ad39d74f9bfa3e23a4630bb3834911304fb5b1c5dd27ca7abca4
3
+ size 4960797040
model-00007-of-00015.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:96b24de264534369322426f21dc8166e3c318fb7c5e33ebcf17f78ede23ea653
3
+ size 4960776560
model-00008-of-00015.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ab4fb18240041eb494d598028dbbd25d801b11dfc77c5acc64a03054b5b225b3
3
+ size 4960776560
model-00009-of-00015.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:96ae16be18429a3030be83c991ee0fc5df99cf76b4df3ece63da747a13c2a98d
3
+ size 4960797040
model-00010-of-00015.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:10264016bf9121aeed7bcd7c0333db35de0ddc342e6df29743634ec63250b4c7
3
+ size 4960776560
model-00011-of-00015.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9c8c5518cd096768a593f14f05b10a92eec5d68a1f7b5989214453754fe8ddb1
3
+ size 4960776560
model-00012-of-00015.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:363eaa96e4c0d46b59d72b2b220811a642be113087eb6180ed1977111aeaf809
3
+ size 4960797040
model-00013-of-00015.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2dce32dfe6fbbd5d0413d5a96555c2ec2e7b032f4e3cd3982c6b0ea504a7cdaf
3
+ size 4960776560
model-00014-of-00015.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e176c640d8d58c88990c890a547810c79df5cb38550cc27b2493f6f4ea5b68b8
3
+ size 4960776560
model-00015-of-00015.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8fd868dd6c2fde7695183bfeb22b61a5fdfe8abee6fbb42856e058d0548ea092
3
+ size 1501135760
model.safetensors.index.json ADDED
The diff for this file is too large to render. See raw diff
 
recipe.yaml ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ DEFAULT_stage:
2
+ DEFAULT_modifiers:
3
+ GPTQModifier:
4
+ sequential_update: true
5
+ targets: [Linear]
6
+ dampening_frac: 0.1
7
+ config_groups:
8
+ group_0:
9
+ targets: [Linear]
10
+ weights: {num_bits: 4, type: int, symmetric: true, strategy: channel, actorder: null,
11
+ observer: minmax}
12
+ input_activations: null
13
+ output_activations: null
14
+ ignore: [lm_head, 're:.*block_sparse_moe.gate']
special_tokens_map.json ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "</s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "unk_token": {
17
+ "content": "<unk>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ }
23
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": true,
3
+ "add_eos_token": false,
4
+ "add_prefix_space": null,
5
+ "added_tokens_decoder": {
6
+ "0": {
7
+ "content": "<unk>",
8
+ "lstrip": false,
9
+ "normalized": false,
10
+ "rstrip": false,
11
+ "single_word": false,
12
+ "special": true
13
+ },
14
+ "1": {
15
+ "content": "<s>",
16
+ "lstrip": false,
17
+ "normalized": false,
18
+ "rstrip": false,
19
+ "single_word": false,
20
+ "special": true
21
+ },
22
+ "2": {
23
+ "content": "</s>",
24
+ "lstrip": false,
25
+ "normalized": false,
26
+ "rstrip": false,
27
+ "single_word": false,
28
+ "special": true
29
+ }
30
+ },
31
+ "additional_special_tokens": [],
32
+ "bos_token": "<s>",
33
+ "clean_up_tokenization_spaces": false,
34
+ "eos_token": "</s>",
35
+ "extra_special_tokens": {},
36
+ "legacy": false,
37
+ "model_max_length": 1000000000000000019884624838656,
38
+ "pad_token": null,
39
+ "sp_model_kwargs": {},
40
+ "spaces_between_special_tokens": false,
41
+ "tokenizer_class": "LlamaTokenizer",
42
+ "unk_token": "<unk>",
43
+ "use_default_system_prompt": false
44
+ }