bwshen-mi commited on
Commit
60804bd
·
verified ·
1 Parent(s): 9733bc5

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,274 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ library_name: transformers
4
+ ---
5
+ <div align="center">
6
+ <picture>
7
+ <source srcset="https://github.com/XiaomiMiMo/MiMo/raw/main/figures/Xiaomi_MiMo_darkmode.png?raw=true" media="(prefers-color-scheme: dark)">
8
+ <img src="https://github.com/XiaomiMiMo/MiMo/raw/main/figures/Xiaomi_MiMo.png?raw=true" width="60%" alt="Xiaomi-MiMo" />
9
+ </picture>
10
+ </div>
11
+
12
+ <h3 align="center">
13
+ <b>
14
+ <span>━━━━━━━━━━━━━━━━━━━━━━━━━</span>
15
+ <br/>
16
+ Unlocking the Reasoning Potential of Language Model<br/>From Pretraining to Posttraining
17
+ <br/>
18
+ <span>━━━━━━━━━━━━━━━━━━━━━━━━━</span>
19
+ <br/>
20
+ </b>
21
+ </h3>
22
+
23
+ <br/>
24
+
25
+ <div align="center" style="line-height: 1;">
26
+ |
27
+ <a href="https://huggingface.co/collections/XiaomiMiMo/mimo-6811688ee20ba7d0682f5cb9" target="_blank">🤗 HuggingFace</a>
28
+ &nbsp;|
29
+ <a href="https://www.modelscope.cn/collections/MiMo-7edb0ab729c744" target="_blank">🤖️ ModelScope</a>
30
+ &nbsp;|
31
+ <a href="https://arxiv.org/abs/2505.07608" target="_blank">📔 Technical Report</a>
32
+ &nbsp;|
33
+ <br/>
34
+ </div>
35
+
36
+ <br/>
37
+
38
+ ---
39
+
40
+ ## Updates
41
+
42
+ [2025.05.30] During the RL training, by continuously expanding the training window size (from 32K to 48K), the performance of MiMo-7B-RL-0530 on AIME24 can be continuously improved and eventually surpass that of DeepSeek R1.
43
+
44
+ <table>
45
+ <thead>
46
+ <tr>
47
+ <th>Benchmark</th>
48
+ <th>MiMo-7B-RL</th>
49
+ <th>MiMo-7B-RL-0530</th>
50
+ </tr>
51
+ </thead>
52
+ <tbody>
53
+ <tr>
54
+ <td colspan="3"><strong>Mathematics</strong></td>
55
+ <p align="center">
56
+ <td rowspan="11"><img width="80%" src="https://github.com/XiaomiMiMo/MiMo-test/raw/main/figures/length.jpg?raw=true"></td>
57
+ </p>
58
+ </tr>
59
+ <tr><td>MATH500<br/>(Pass@1)</td><td>95.8</td><td>97.2</td></tr>
60
+ <tr><td>AIME 2024<br/>(Pass@1)</td><td>68.2</td><td>80.1</td></tr>
61
+ <tr><td>AIME 2025<br/>(Pass@1)</td><td>55.4</td><td>70.2</td></tr>
62
+ <tr><td colspan="3"><strong>Code</strong></td></tr>
63
+ <tr><td>LiveCodeBench v5<br/>(Pass@1)</td><td>57.8</td><td>60.9</td></tr>
64
+ <tr><td>LiveCodeBench v6<br/>(Pass@1)</td><td>49.3</td><td>52.2</td></tr>
65
+ <tr><td colspan="3"><strong>STEM</strong></td></tr>
66
+ <tr><td>GPQA-Diamond<br/>(Pass@1)</td><td>54.4</td><td>60.6</td></tr>
67
+ <tr><td colspan="3"><strong>General</strong></td></tr>
68
+ <tr><td>Alignbench1.1<br/>(Evaluated by GPT4.1)</td><td>6.9</td><td>7.4</td></tr>
69
+ </tbody>
70
+ </table>
71
+
72
+ ---
73
+
74
+ ## I. Introduction
75
+
76
+ Currently, most successful RL works, including open-source research, rely on relatively large base models, e.g., 32B models, particularly for enhancing code reasoning capabilities. Moreover, it was widely considered that achieving uniform and simultaneous improvements in both mathematical and code capabilities within a small model is challenging. Nonetheless, we believe that the effectiveness of the RL trained reasoning model relies on the inherent reasoning potential of the base model. To fully unlock the reasoning potential of language models, efforts must focus not only on post-training but also on pre-training strategies tailored to reasoning.
77
+
78
+ In this work, we present MiMo-7B, a series of models trained from scratch and born for reasoning tasks. Our RL experiments from MiMo-7B-Base show that our model possesses extraordinary reasoning potential, even surpassing much larger 32B models. Additionally, we perform RL training on a cold-started SFT model, resulting in MiMo-7B-RL, which demonstrates superior performance on both mathematics and code reasoning tasks, matching the performance of OpenAI o1-mini.
79
+
80
+ <p align="center">
81
+ <img width="80%" src="https://github.com/XiaomiMiMo/MiMo/raw/main/figures/curve.png?raw=true">
82
+ </p>
83
+
84
+ We open-source MiMo-7B series, including checkpoints of the base model, SFT model, RL model trained from base model, and RL model trained from the SFT model.
85
+ We believe this report along with the models will provide valuable insights to develop powerful reasoning LLMs that benefit the larger community.
86
+
87
+ ### 🌟 Highlights
88
+
89
+ - **Pre-Training: Base Model Born for Reasoning**
90
+ - We optimize the data preprocessing pipeline, enhancing text extraction toolkits and applying multi-dimensional data filtering to increase reasoning pattern density in pre-training data. We also employ multiple strategies to generate massive diverse synthetic reasoning data.
91
+ - We adopt a three-stage data mixture strategy for pre-training. Overall, MiMo-7B-Base is pre-trained on approximately 25 trillion tokens.
92
+ - We incorporate Multiple-Token Prediction as an additional training objective, which enhances model performance and accelerates inference.
93
+
94
+ - **Post-Training Recipe: Pioneering Reasoning Model**
95
+ - We curate 130K mathematics and code problems as RL training data, which can be verified by rule-based verifiers. Each problem undergoes careful cleaning and difficulty assessment to ensure quality. We employ only rule-based accuracy rewards to avoid potential reward hacking.
96
+ - To mitigate the sparse reward issue for challenging code problems, we introduce a test difficulty driven code reward. By assigning fine-grained scores for test cases with varying difficulty levels, the policy can be more effectively optimized via dense reward signal.
97
+ - We implement a data re-sampling strategy for easy problems to enhance rollout sampling efficiency and stabilize policy updates, particularly in the later phases of RL training.
98
+
99
+ - **RL Infrastructure**
100
+ - We develop a Seamless Rollout Engine to accelerate RL training and validation. Our design integrates continuous rollout, asynchronous reward computation, and early termination to minimize GPU idle time, achieving $2.29\times$ faster training and $1.96\times$ faster validation.
101
+ - We support MTP in vLLM and enhance the robustness of the inference engine in the RL system.
102
+
103
+ ## II. Model Details
104
+
105
+ The MTP layers of MiMo-7B is tuned during pretraining and SFT and freezed during RL. With one MTP layer for speculative decoding, the acceptance rate is about 90%.
106
+
107
+ <p align="center">
108
+ <img width="80%" src="https://github.com/XiaomiMiMo/MiMo/raw/main/figures/architecture.png?raw=true">
109
+ </p>
110
+
111
+ > Models are available at [Huggingface Collections: MiMo](https://huggingface.co/collections/XiaomiMiMo/mimo-6811688ee20ba7d0682f5cb9) and [ModelScope Collections: MiMo](https://www.modelscope.cn/collections/MiMo-7edb0ab729c744)
112
+
113
+
114
+ | **Model** | **Description** | **Download (HuggingFace)** | **Download (ModelScope)** |
115
+ | :-------------: | :---------------------------------------------------------------------------: | :-------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------: |
116
+ | MiMo-7B-Base | Base model with extraordinary reasoning potential | [🤗 XiaomiMiMo/MiMo-7B-Base](https://huggingface.co/XiaomiMiMo/MiMo-7B-Base) | [🤖️ XiaomiMiMo/MiMo-7B-Base](https://www.modelscope.cn/models/XiaomiMiMo/MiMo-7B-Base) |
117
+ | MiMo-7B-RL-Zero | RL model trained from base model | [🤗 XiaomiMiMo/MiMo-7B-RL-Zero](https://huggingface.co/XiaomiMiMo/MiMo-7B-RL-Zero) | [🤖️ XiaomiMiMo/MiMo-7B-RL-Zero](https://www.modelscope.cn/models/XiaomiMiMo/MiMo-7B-RL-Zero) |
118
+ | MiMo-7B-SFT | SFT model trained from base model | [🤗 XiaomiMiMo/MiMo-7B-SFT](https://huggingface.co/XiaomiMiMo/MiMo-7B-SFT) | [🤖️ XiaomiMiMo/MiMo-7B-SFT](https://www.modelscope.cn/models/XiaomiMiMo/MiMo-7B-SFT) |
119
+ | MiMo-7B-RL | RL model trained from SFT model, superior performance matching OpenAI o1-mini | [🤗 XiaomiMiMo/MiMo-7B-RL](https://huggingface.co/XiaomiMiMo/MiMo-7B-RL) | [🤖️ XiaomiMiMo/MiMo-7B-RL](https://www.modelscope.cn/models/XiaomiMiMo/MiMo-7B-RL) |
120
+ | MiMo-7B-RL-0530 | Advanced RL model with extended length | [🤗 XiaomiMiMo/MiMo-7B-RL-0530](https://huggingface.co/XiaomiMiMo/MiMo-7B-RL-0530) | [🤖️ XiaomiMiMo/MiMo-7B-RL-0530](https://www.modelscope.cn/models/XiaomiMiMo/MiMo-7B-RL-0530) |
121
+
122
+ ## III. Evaluation Results
123
+
124
+ | Benchmark | GPT-4o-0513 | Claude-3.5-Sonnet-1022 | OpenAI o1-mini | QwQ-32B-Preview | R1-Distill-Qwen-14B | R1-Distill-Qwen-7B | MiMo-7B-RL |
125
+ | ----------------------------- | :---------: | :--------------------: | :------------: | :-------------: | :-----------------: | :----------------: | :--------: |
126
+ | **General** | | | | | | | |
127
+ | GPQA Diamond<br/>(Pass@1) | 49.9 | 65.0 | 60.0 | 54.5 | 59.1 | 49.1 | 54.4 |
128
+ | SuperGPQA<br/>(Pass@1) | 42.4 | 48.2 | 45.2 | 43.6 | 40.6 | 28.9 | 40.5 |
129
+ | DROP<br/>(3-shot F1) | 83.7 | 88.3 | 83.9 | 71.2 | 85.5 | 77.0 | 78.7 |
130
+ | MMLU-Pro<br/>(EM) | 72.6 | 78.0 | 80.3 | 52.0 | 68.8 | 53.5 | 58.6 |
131
+ | IF-Eval<br/>(Prompt Strict) | 84.3 | 86.5 | 84.8 | 40.4 | 78.3 | 60.5 | 61.0 |
132
+ | **Mathematics** | | | | | | | |
133
+ | MATH-500<br/>(Pass@1) | 74.6 | 78.3 | 90.0 | 90.6 | 93.9 | 92.8 | 95.8 |
134
+ | AIME 2024<br/>(Pass@1) | 9.3 | 16.0 | 63.6 | 50.0 | 69.7 | 55.5 | 68.2 |
135
+ | AIME 2025<br/>(Pass@1) | 11.6 | 7.4 | 50.7 | 32.4 | 48.2 | 38.8 | 55.4 |
136
+ | **Code** | | | | | | | |
137
+ | LiveCodeBench v5<br/>(Pass@1) | 32.9 | 38.9 | 53.8 | 41.9 | 53.1 | 37.6 | 57.8 |
138
+ | LiveCodeBench v6<br/>(Pass@1) | 30.9 | 37.2 | 46.8 | 39.1 | 31.9 | 23.9 | 49.3 |
139
+
140
+ MiMo-7B series
141
+
142
+ | Benchmark | MiMo-7B-Base | MiMo-7B-RL-Zero | MiMo-7B-SFT | MiMo-7B-RL | MiMo-7B-RL-0530 |
143
+ | ----------------------------- | :----------: | :-------------: | :---------: | :--------: | :-------------: |
144
+ | **Mathematics** | | | | | |
145
+ | MATH500<br/>(Pass@1) | 37.4 | 93.6 | 93.0 | 95.8 | 97.2 |
146
+ | AIME 2024<br/>(Pass@1) | 32.9 | 56.4 | 58.7 | 68.2 | 80.1 |
147
+ | AIME 2025<br/>(Pass@1) | 24.3 | 46.3 | 44.3 | 55.4 | 70.2 |
148
+ | **Code** | | | | | |
149
+ | LiveCodeBench v5<br/>(Pass@1) | 32.9 | 49.1 | 52.3 | 57.8 | 60.9 |
150
+ | LiveCodeBench v6<br/>(Pass@1) | 29.1 | 42.9 | 45.5 | 49.3 | 52.2 |
151
+
152
+ > [!IMPORTANT]
153
+ > The evaluations are conducted with `temperature=0.6`.
154
+ >
155
+ > AIME24 and AIME25 are with averaged score of 32 repetitions. LiveCodeBench v5 (20240801-20250201), LiveCodeBench v6 (20250201-20250501), GPQA-Diamond and IF-Eval are with averaged score of 8 repetitions. MATH500 and SuperGPQA are with a single run.
156
+
157
+ ## IV. Deployment
158
+
159
+ ### SGLang Inference
160
+
161
+ Thanks to the [contribution](https://github.com/sgl-project/sglang/pull/5921) from the SGLang team, we supported MiMo in SGLang mainstream within 24h with MTP coming soon.
162
+
163
+ Example Script
164
+
165
+ ```bash
166
+ # Install the latest SGlang from main branch
167
+ python3 -m uv pip install "sglang[all] @ git+https://github.com/sgl-project/sglang.git/@main#egg=sglang&subdirectory=python"
168
+
169
+ # Launch SGLang Server
170
+ python3 -m sglang.launch_server --model-path XiaomiMiMo/MiMo-7B-RL --host 0.0.0.0 --trust-remote-code
171
+ ```
172
+
173
+ Detailed usage can be found in [SGLang documents](https://docs.sglang.ai/backend/send_request.html). MTP will also be supported in 24h.
174
+
175
+ ### vLLM inference
176
+
177
+ 1. [Recommended] We officially support inference with MiMo-MTP using [our fork of vLLM](https://github.com/XiaomiMiMo/vllm/tree/feat_mimo_mtp_stable_073).
178
+
179
+ Example script
180
+
181
+ ```py
182
+ from vllm import LLM, SamplingParams
183
+
184
+ model_path = "/path/to/MiMo"
185
+ llm = LLM(
186
+ model=model_path,
187
+ trust_remote_code=True,
188
+ num_speculative_tokens=1,
189
+ disable_log_stats=False
190
+ )
191
+ sampling_params = SamplingParams(temperature=0.6)
192
+
193
+ conversation = [
194
+ {
195
+ "role": "system",
196
+ "content": ""
197
+ },
198
+ {
199
+ "role": "user",
200
+ "content": "Write an essay about the importance of higher education.",
201
+ },
202
+ ]
203
+
204
+ outputs = llm.chat(conversation,
205
+ sampling_params=sampling_params,
206
+ use_tqdm=False)
207
+
208
+ for output in outputs:
209
+ prompt = output.prompt
210
+ generated_text = output.outputs[0].text
211
+ print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
212
+
213
+ print("=" * 80)
214
+ ```
215
+
216
+ 2. Or, you can register a vLLM loader for MiMo without loading MTP parameters.
217
+
218
+ You can copy the [`registry/register_mimo_in_vllm.py`](https://github.com/XiaomiMiMo/MiMo/blob/main/registry/register_mimo_in_vllm.py) to your directory and import it with
219
+
220
+ ```py
221
+ import register_mimo_in_vllm
222
+
223
+ from vllm import LLM, SamplingParams
224
+
225
+ model_path = "/path/to/MiMo"
226
+ llm = LLM(
227
+ model=model_path,
228
+ trust_remote_code=True,
229
+ # num_speculative_tokens=1,
230
+ disable_log_stats=False
231
+ )
232
+ sampling_params = SamplingParams(temperature=0.6)
233
+ ```
234
+
235
+ ### HuggingFace inference
236
+
237
+ Example script
238
+
239
+ ```py
240
+ from transformers import AutoModel, AutoModelForCausalLM, AutoTokenizer
241
+
242
+ model_id = "XiaomiMiMo/MiMo-7B-RL-0530"
243
+ model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)
244
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
245
+ inputs = tokenizer(["Today is"], return_tensors='pt')
246
+ output = model.generate(**inputs, max_new_tokens = 100)
247
+ print(tokenizer.decode(output.tolist()[0]))
248
+ ```
249
+
250
+ ### Recommended environment and prompts
251
+
252
+ - We recommend using [our fork of vLLM](https://github.com/XiaomiMiMo/vllm/tree/feat_mimo_mtp_stable_073) which is developed based on vLLM 0.7.3.
253
+ - We recommend using empty system prompt.
254
+
255
+ > We haven't verified MiMo with other inference engines and welcome contributions based on the model definition in the Huggingface repo 💻.
256
+
257
+ ## V. Citation
258
+
259
+ ```bibtex
260
+ @misc{coreteam2025mimounlockingreasoningpotential,
261
+ title={MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining},
262
+ author={{Xiaomi LLM-Core Team}},
263
+ year={2025},
264
+ eprint={2505.07608},
265
+ archivePrefix={arXiv},
266
+ primaryClass={cs.CL},
267
+ url={https://arxiv.org/abs/2505.07608},
268
+ }
269
+ ```
270
+
271
+
272
+ ## VI. Contact
273
+
274
+ Please contact us at [[email protected]](mailto:[email protected]) or open an issue if you have any questions.
added_tokens.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "</tool_call>": 151658,
3
+ "<tool_call>": 151657,
4
+ "<|box_end|>": 151649,
5
+ "<|box_start|>": 151648,
6
+ "<|endoftext|>": 151643,
7
+ "<|file_sep|>": 151664,
8
+ "<|fim_middle|>": 151660,
9
+ "<|fim_pad|>": 151662,
10
+ "<|fim_prefix|>": 151659,
11
+ "<|fim_suffix|>": 151661,
12
+ "<|im_end|>": 151645,
13
+ "<|im_start|>": 151644,
14
+ "<|image_pad|>": 151655,
15
+ "<|object_ref_end|>": 151647,
16
+ "<|object_ref_start|>": 151646,
17
+ "<|quad_end|>": 151651,
18
+ "<|quad_start|>": 151650,
19
+ "<|repo_name|>": 151663,
20
+ "<|video_pad|>": 151656,
21
+ "<|vision_end|>": 151653,
22
+ "<|vision_pad|>": 151654,
23
+ "<|vision_start|>": 151652
24
+ }
config.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "MiMoForCausalLM"
4
+ ],
5
+ "attention_bias": true,
6
+ "attention_dropout": 0.0,
7
+ "auto_map": {
8
+ "AutoConfig": "configuration_mimo.MiMoConfig",
9
+ "AutoModel": "modeling_mimo.MiMoModel",
10
+ "AutoModelForCausalLM": "modeling_mimo.MiMoForCausalLM"
11
+ },
12
+ "bos_token_id": 151643,
13
+ "eos_token_id": 151645,
14
+ "head_dim": 128,
15
+ "hidden_act": "silu",
16
+ "hidden_size": 4096,
17
+ "initializer_range": 0.02,
18
+ "intermediate_size": 11008,
19
+ "max_position_embeddings": 65536,
20
+ "max_window_layers": 36,
21
+ "model_type": "mimo",
22
+ "num_attention_heads": 32,
23
+ "num_hidden_layers": 36,
24
+ "num_key_value_heads": 8,
25
+ "num_nextn_predict_layers": 1,
26
+ "rms_norm_eps": 1e-05,
27
+ "rope_scaling": null,
28
+ "rope_theta": 640000,
29
+ "sliding_window": 65536,
30
+ "tie_word_embeddings": false,
31
+ "torch_dtype": "bfloat16",
32
+ "transformers_version": "4.51.1",
33
+ "use_cache": true,
34
+ "use_mrope": false,
35
+ "use_sliding_window": false,
36
+ "vocab_size": 151680
37
+ }
configuration_mimo.py ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from transformers.models.qwen2.configuration_qwen2 import Qwen2Config
2
+
3
+ class MiMoConfig(Qwen2Config):
4
+ model_type = "mimo"
5
+
6
+ def __init__(
7
+ self,
8
+ *args,
9
+ num_nextn_predict_layers=0,
10
+ **kwargs
11
+ ):
12
+ self.num_nextn_predict_layers = num_nextn_predict_layers
13
+ super().__init__(
14
+ *args,
15
+ **kwargs,
16
+ )
generation_config.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 151643,
4
+ "eos_token_id": 151645,
5
+ "max_new_tokens": 2048,
6
+ "transformers_version": "4.51.1"
7
+ }
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
model-00001-of-00004.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9b4d1566074fe7c7afaa396ba1eb95a34890b48a8433e89a19fa99c4b4c951db
3
+ size 3987958288
model-00002-of-00004.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:06dde056e1e06ce66ab72b70da852f2a92ad6e2da4c99f5860b8273ad3dd0e0b
3
+ size 3989130384
model-00003-of-00004.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:445637e4aa2cba64070751acaf06017ae29cd5cd57dfb479a3b1a5ee2c9897c5
3
+ size 3982835320
model-00004-of-00004.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1ea475d9ab1093ce5ceebf8852b828e1b7c9829351cfc911d24e0f1e4d778056
3
+ size 3706946944
model.safetensors.index.json ADDED
@@ -0,0 +1,458 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_size": 15666819072
4
+ },
5
+ "weight_map": {
6
+ "lm_head.weight": "model-00004-of-00004.safetensors",
7
+ "model.embed_tokens.weight": "model-00001-of-00004.safetensors",
8
+ "model.layers.0.input_layernorm.weight": "model-00001-of-00004.safetensors",
9
+ "model.layers.0.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
10
+ "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
11
+ "model.layers.0.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
12
+ "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
13
+ "model.layers.0.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
14
+ "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
15
+ "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
16
+ "model.layers.0.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
17
+ "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
18
+ "model.layers.0.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
19
+ "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
20
+ "model.layers.1.input_layernorm.weight": "model-00001-of-00004.safetensors",
21
+ "model.layers.1.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
22
+ "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
23
+ "model.layers.1.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
24
+ "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
25
+ "model.layers.1.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
26
+ "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
27
+ "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
28
+ "model.layers.1.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
29
+ "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
30
+ "model.layers.1.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
31
+ "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
32
+ "model.layers.10.input_layernorm.weight": "model-00002-of-00004.safetensors",
33
+ "model.layers.10.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
34
+ "model.layers.10.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
35
+ "model.layers.10.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
36
+ "model.layers.10.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
37
+ "model.layers.10.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
38
+ "model.layers.10.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
39
+ "model.layers.10.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
40
+ "model.layers.10.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
41
+ "model.layers.10.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
42
+ "model.layers.10.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
43
+ "model.layers.10.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
44
+ "model.layers.11.input_layernorm.weight": "model-00002-of-00004.safetensors",
45
+ "model.layers.11.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
46
+ "model.layers.11.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
47
+ "model.layers.11.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
48
+ "model.layers.11.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
49
+ "model.layers.11.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
50
+ "model.layers.11.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
51
+ "model.layers.11.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
52
+ "model.layers.11.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
53
+ "model.layers.11.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
54
+ "model.layers.11.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
55
+ "model.layers.11.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
56
+ "model.layers.12.input_layernorm.weight": "model-00002-of-00004.safetensors",
57
+ "model.layers.12.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
58
+ "model.layers.12.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
59
+ "model.layers.12.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
60
+ "model.layers.12.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
61
+ "model.layers.12.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
62
+ "model.layers.12.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
63
+ "model.layers.12.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
64
+ "model.layers.12.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
65
+ "model.layers.12.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
66
+ "model.layers.12.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
67
+ "model.layers.12.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
68
+ "model.layers.13.input_layernorm.weight": "model-00002-of-00004.safetensors",
69
+ "model.layers.13.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
70
+ "model.layers.13.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
71
+ "model.layers.13.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
72
+ "model.layers.13.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
73
+ "model.layers.13.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
74
+ "model.layers.13.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
75
+ "model.layers.13.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
76
+ "model.layers.13.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
77
+ "model.layers.13.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
78
+ "model.layers.13.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
79
+ "model.layers.13.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
80
+ "model.layers.14.input_layernorm.weight": "model-00002-of-00004.safetensors",
81
+ "model.layers.14.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
82
+ "model.layers.14.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
83
+ "model.layers.14.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
84
+ "model.layers.14.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
85
+ "model.layers.14.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
86
+ "model.layers.14.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
87
+ "model.layers.14.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
88
+ "model.layers.14.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
89
+ "model.layers.14.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
90
+ "model.layers.14.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
91
+ "model.layers.14.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
92
+ "model.layers.15.input_layernorm.weight": "model-00002-of-00004.safetensors",
93
+ "model.layers.15.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
94
+ "model.layers.15.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
95
+ "model.layers.15.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
96
+ "model.layers.15.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
97
+ "model.layers.15.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
98
+ "model.layers.15.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
99
+ "model.layers.15.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
100
+ "model.layers.15.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
101
+ "model.layers.15.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
102
+ "model.layers.15.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
103
+ "model.layers.15.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
104
+ "model.layers.16.input_layernorm.weight": "model-00002-of-00004.safetensors",
105
+ "model.layers.16.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
106
+ "model.layers.16.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
107
+ "model.layers.16.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
108
+ "model.layers.16.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
109
+ "model.layers.16.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
110
+ "model.layers.16.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
111
+ "model.layers.16.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
112
+ "model.layers.16.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
113
+ "model.layers.16.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
114
+ "model.layers.16.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
115
+ "model.layers.16.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
116
+ "model.layers.17.input_layernorm.weight": "model-00002-of-00004.safetensors",
117
+ "model.layers.17.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
118
+ "model.layers.17.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
119
+ "model.layers.17.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
120
+ "model.layers.17.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
121
+ "model.layers.17.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
122
+ "model.layers.17.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
123
+ "model.layers.17.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
124
+ "model.layers.17.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
125
+ "model.layers.17.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
126
+ "model.layers.17.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
127
+ "model.layers.17.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
128
+ "model.layers.18.input_layernorm.weight": "model-00002-of-00004.safetensors",
129
+ "model.layers.18.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
130
+ "model.layers.18.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
131
+ "model.layers.18.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
132
+ "model.layers.18.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
133
+ "model.layers.18.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
134
+ "model.layers.18.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
135
+ "model.layers.18.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
136
+ "model.layers.18.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
137
+ "model.layers.18.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
138
+ "model.layers.18.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
139
+ "model.layers.18.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
140
+ "model.layers.19.input_layernorm.weight": "model-00003-of-00004.safetensors",
141
+ "model.layers.19.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
142
+ "model.layers.19.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
143
+ "model.layers.19.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
144
+ "model.layers.19.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
145
+ "model.layers.19.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
146
+ "model.layers.19.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
147
+ "model.layers.19.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
148
+ "model.layers.19.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
149
+ "model.layers.19.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
150
+ "model.layers.19.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
151
+ "model.layers.19.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
152
+ "model.layers.2.input_layernorm.weight": "model-00001-of-00004.safetensors",
153
+ "model.layers.2.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
154
+ "model.layers.2.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
155
+ "model.layers.2.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
156
+ "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
157
+ "model.layers.2.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
158
+ "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
159
+ "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
160
+ "model.layers.2.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
161
+ "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
162
+ "model.layers.2.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
163
+ "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
164
+ "model.layers.20.input_layernorm.weight": "model-00003-of-00004.safetensors",
165
+ "model.layers.20.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
166
+ "model.layers.20.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
167
+ "model.layers.20.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
168
+ "model.layers.20.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
169
+ "model.layers.20.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
170
+ "model.layers.20.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
171
+ "model.layers.20.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
172
+ "model.layers.20.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
173
+ "model.layers.20.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
174
+ "model.layers.20.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
175
+ "model.layers.20.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
176
+ "model.layers.21.input_layernorm.weight": "model-00003-of-00004.safetensors",
177
+ "model.layers.21.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
178
+ "model.layers.21.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
179
+ "model.layers.21.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
180
+ "model.layers.21.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
181
+ "model.layers.21.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
182
+ "model.layers.21.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
183
+ "model.layers.21.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
184
+ "model.layers.21.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
185
+ "model.layers.21.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
186
+ "model.layers.21.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
187
+ "model.layers.21.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
188
+ "model.layers.22.input_layernorm.weight": "model-00003-of-00004.safetensors",
189
+ "model.layers.22.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
190
+ "model.layers.22.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
191
+ "model.layers.22.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
192
+ "model.layers.22.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
193
+ "model.layers.22.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
194
+ "model.layers.22.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
195
+ "model.layers.22.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
196
+ "model.layers.22.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
197
+ "model.layers.22.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
198
+ "model.layers.22.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
199
+ "model.layers.22.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
200
+ "model.layers.23.input_layernorm.weight": "model-00003-of-00004.safetensors",
201
+ "model.layers.23.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
202
+ "model.layers.23.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
203
+ "model.layers.23.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
204
+ "model.layers.23.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
205
+ "model.layers.23.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
206
+ "model.layers.23.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
207
+ "model.layers.23.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
208
+ "model.layers.23.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
209
+ "model.layers.23.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
210
+ "model.layers.23.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
211
+ "model.layers.23.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
212
+ "model.layers.24.input_layernorm.weight": "model-00003-of-00004.safetensors",
213
+ "model.layers.24.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
214
+ "model.layers.24.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
215
+ "model.layers.24.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
216
+ "model.layers.24.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
217
+ "model.layers.24.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
218
+ "model.layers.24.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
219
+ "model.layers.24.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
220
+ "model.layers.24.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
221
+ "model.layers.24.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
222
+ "model.layers.24.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
223
+ "model.layers.24.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
224
+ "model.layers.25.input_layernorm.weight": "model-00003-of-00004.safetensors",
225
+ "model.layers.25.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
226
+ "model.layers.25.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
227
+ "model.layers.25.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
228
+ "model.layers.25.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
229
+ "model.layers.25.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
230
+ "model.layers.25.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
231
+ "model.layers.25.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
232
+ "model.layers.25.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
233
+ "model.layers.25.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
234
+ "model.layers.25.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
235
+ "model.layers.25.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
236
+ "model.layers.26.input_layernorm.weight": "model-00003-of-00004.safetensors",
237
+ "model.layers.26.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
238
+ "model.layers.26.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
239
+ "model.layers.26.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
240
+ "model.layers.26.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
241
+ "model.layers.26.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
242
+ "model.layers.26.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
243
+ "model.layers.26.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
244
+ "model.layers.26.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
245
+ "model.layers.26.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
246
+ "model.layers.26.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
247
+ "model.layers.26.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
248
+ "model.layers.27.input_layernorm.weight": "model-00003-of-00004.safetensors",
249
+ "model.layers.27.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
250
+ "model.layers.27.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
251
+ "model.layers.27.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
252
+ "model.layers.27.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
253
+ "model.layers.27.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
254
+ "model.layers.27.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
255
+ "model.layers.27.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
256
+ "model.layers.27.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
257
+ "model.layers.27.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
258
+ "model.layers.27.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
259
+ "model.layers.27.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
260
+ "model.layers.28.input_layernorm.weight": "model-00003-of-00004.safetensors",
261
+ "model.layers.28.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
262
+ "model.layers.28.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
263
+ "model.layers.28.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
264
+ "model.layers.28.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
265
+ "model.layers.28.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
266
+ "model.layers.28.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
267
+ "model.layers.28.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
268
+ "model.layers.28.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
269
+ "model.layers.28.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
270
+ "model.layers.28.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
271
+ "model.layers.28.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
272
+ "model.layers.29.input_layernorm.weight": "model-00003-of-00004.safetensors",
273
+ "model.layers.29.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
274
+ "model.layers.29.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
275
+ "model.layers.29.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
276
+ "model.layers.29.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
277
+ "model.layers.29.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
278
+ "model.layers.29.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
279
+ "model.layers.29.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
280
+ "model.layers.29.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
281
+ "model.layers.29.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
282
+ "model.layers.29.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
283
+ "model.layers.29.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
284
+ "model.layers.3.input_layernorm.weight": "model-00001-of-00004.safetensors",
285
+ "model.layers.3.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
286
+ "model.layers.3.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
287
+ "model.layers.3.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
288
+ "model.layers.3.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
289
+ "model.layers.3.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
290
+ "model.layers.3.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
291
+ "model.layers.3.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
292
+ "model.layers.3.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
293
+ "model.layers.3.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
294
+ "model.layers.3.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
295
+ "model.layers.3.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
296
+ "model.layers.30.input_layernorm.weight": "model-00004-of-00004.safetensors",
297
+ "model.layers.30.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
298
+ "model.layers.30.mlp.gate_proj.weight": "model-00004-of-00004.safetensors",
299
+ "model.layers.30.mlp.up_proj.weight": "model-00004-of-00004.safetensors",
300
+ "model.layers.30.post_attention_layernorm.weight": "model-00004-of-00004.safetensors",
301
+ "model.layers.30.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
302
+ "model.layers.30.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
303
+ "model.layers.30.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
304
+ "model.layers.30.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
305
+ "model.layers.30.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
306
+ "model.layers.30.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
307
+ "model.layers.30.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
308
+ "model.layers.31.input_layernorm.weight": "model-00004-of-00004.safetensors",
309
+ "model.layers.31.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
310
+ "model.layers.31.mlp.gate_proj.weight": "model-00004-of-00004.safetensors",
311
+ "model.layers.31.mlp.up_proj.weight": "model-00004-of-00004.safetensors",
312
+ "model.layers.31.post_attention_layernorm.weight": "model-00004-of-00004.safetensors",
313
+ "model.layers.31.self_attn.k_proj.bias": "model-00004-of-00004.safetensors",
314
+ "model.layers.31.self_attn.k_proj.weight": "model-00004-of-00004.safetensors",
315
+ "model.layers.31.self_attn.o_proj.weight": "model-00004-of-00004.safetensors",
316
+ "model.layers.31.self_attn.q_proj.bias": "model-00004-of-00004.safetensors",
317
+ "model.layers.31.self_attn.q_proj.weight": "model-00004-of-00004.safetensors",
318
+ "model.layers.31.self_attn.v_proj.bias": "model-00004-of-00004.safetensors",
319
+ "model.layers.31.self_attn.v_proj.weight": "model-00004-of-00004.safetensors",
320
+ "model.layers.32.input_layernorm.weight": "model-00004-of-00004.safetensors",
321
+ "model.layers.32.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
322
+ "model.layers.32.mlp.gate_proj.weight": "model-00004-of-00004.safetensors",
323
+ "model.layers.32.mlp.up_proj.weight": "model-00004-of-00004.safetensors",
324
+ "model.layers.32.post_attention_layernorm.weight": "model-00004-of-00004.safetensors",
325
+ "model.layers.32.self_attn.k_proj.bias": "model-00004-of-00004.safetensors",
326
+ "model.layers.32.self_attn.k_proj.weight": "model-00004-of-00004.safetensors",
327
+ "model.layers.32.self_attn.o_proj.weight": "model-00004-of-00004.safetensors",
328
+ "model.layers.32.self_attn.q_proj.bias": "model-00004-of-00004.safetensors",
329
+ "model.layers.32.self_attn.q_proj.weight": "model-00004-of-00004.safetensors",
330
+ "model.layers.32.self_attn.v_proj.bias": "model-00004-of-00004.safetensors",
331
+ "model.layers.32.self_attn.v_proj.weight": "model-00004-of-00004.safetensors",
332
+ "model.layers.33.input_layernorm.weight": "model-00004-of-00004.safetensors",
333
+ "model.layers.33.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
334
+ "model.layers.33.mlp.gate_proj.weight": "model-00004-of-00004.safetensors",
335
+ "model.layers.33.mlp.up_proj.weight": "model-00004-of-00004.safetensors",
336
+ "model.layers.33.post_attention_layernorm.weight": "model-00004-of-00004.safetensors",
337
+ "model.layers.33.self_attn.k_proj.bias": "model-00004-of-00004.safetensors",
338
+ "model.layers.33.self_attn.k_proj.weight": "model-00004-of-00004.safetensors",
339
+ "model.layers.33.self_attn.o_proj.weight": "model-00004-of-00004.safetensors",
340
+ "model.layers.33.self_attn.q_proj.bias": "model-00004-of-00004.safetensors",
341
+ "model.layers.33.self_attn.q_proj.weight": "model-00004-of-00004.safetensors",
342
+ "model.layers.33.self_attn.v_proj.bias": "model-00004-of-00004.safetensors",
343
+ "model.layers.33.self_attn.v_proj.weight": "model-00004-of-00004.safetensors",
344
+ "model.layers.34.input_layernorm.weight": "model-00004-of-00004.safetensors",
345
+ "model.layers.34.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
346
+ "model.layers.34.mlp.gate_proj.weight": "model-00004-of-00004.safetensors",
347
+ "model.layers.34.mlp.up_proj.weight": "model-00004-of-00004.safetensors",
348
+ "model.layers.34.post_attention_layernorm.weight": "model-00004-of-00004.safetensors",
349
+ "model.layers.34.self_attn.k_proj.bias": "model-00004-of-00004.safetensors",
350
+ "model.layers.34.self_attn.k_proj.weight": "model-00004-of-00004.safetensors",
351
+ "model.layers.34.self_attn.o_proj.weight": "model-00004-of-00004.safetensors",
352
+ "model.layers.34.self_attn.q_proj.bias": "model-00004-of-00004.safetensors",
353
+ "model.layers.34.self_attn.q_proj.weight": "model-00004-of-00004.safetensors",
354
+ "model.layers.34.self_attn.v_proj.bias": "model-00004-of-00004.safetensors",
355
+ "model.layers.34.self_attn.v_proj.weight": "model-00004-of-00004.safetensors",
356
+ "model.layers.35.input_layernorm.weight": "model-00004-of-00004.safetensors",
357
+ "model.layers.35.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
358
+ "model.layers.35.mlp.gate_proj.weight": "model-00004-of-00004.safetensors",
359
+ "model.layers.35.mlp.up_proj.weight": "model-00004-of-00004.safetensors",
360
+ "model.layers.35.post_attention_layernorm.weight": "model-00004-of-00004.safetensors",
361
+ "model.layers.35.self_attn.k_proj.bias": "model-00004-of-00004.safetensors",
362
+ "model.layers.35.self_attn.k_proj.weight": "model-00004-of-00004.safetensors",
363
+ "model.layers.35.self_attn.o_proj.weight": "model-00004-of-00004.safetensors",
364
+ "model.layers.35.self_attn.q_proj.bias": "model-00004-of-00004.safetensors",
365
+ "model.layers.35.self_attn.q_proj.weight": "model-00004-of-00004.safetensors",
366
+ "model.layers.35.self_attn.v_proj.bias": "model-00004-of-00004.safetensors",
367
+ "model.layers.35.self_attn.v_proj.weight": "model-00004-of-00004.safetensors",
368
+ "model.layers.4.input_layernorm.weight": "model-00001-of-00004.safetensors",
369
+ "model.layers.4.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
370
+ "model.layers.4.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
371
+ "model.layers.4.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
372
+ "model.layers.4.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
373
+ "model.layers.4.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
374
+ "model.layers.4.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
375
+ "model.layers.4.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
376
+ "model.layers.4.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
377
+ "model.layers.4.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
378
+ "model.layers.4.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
379
+ "model.layers.4.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
380
+ "model.layers.5.input_layernorm.weight": "model-00001-of-00004.safetensors",
381
+ "model.layers.5.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
382
+ "model.layers.5.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
383
+ "model.layers.5.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
384
+ "model.layers.5.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
385
+ "model.layers.5.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
386
+ "model.layers.5.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
387
+ "model.layers.5.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
388
+ "model.layers.5.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
389
+ "model.layers.5.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
390
+ "model.layers.5.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
391
+ "model.layers.5.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
392
+ "model.layers.6.input_layernorm.weight": "model-00001-of-00004.safetensors",
393
+ "model.layers.6.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
394
+ "model.layers.6.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
395
+ "model.layers.6.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
396
+ "model.layers.6.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
397
+ "model.layers.6.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
398
+ "model.layers.6.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
399
+ "model.layers.6.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
400
+ "model.layers.6.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
401
+ "model.layers.6.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
402
+ "model.layers.6.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
403
+ "model.layers.6.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
404
+ "model.layers.7.input_layernorm.weight": "model-00002-of-00004.safetensors",
405
+ "model.layers.7.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
406
+ "model.layers.7.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
407
+ "model.layers.7.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
408
+ "model.layers.7.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
409
+ "model.layers.7.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
410
+ "model.layers.7.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
411
+ "model.layers.7.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
412
+ "model.layers.7.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
413
+ "model.layers.7.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
414
+ "model.layers.7.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
415
+ "model.layers.7.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
416
+ "model.layers.8.input_layernorm.weight": "model-00002-of-00004.safetensors",
417
+ "model.layers.8.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
418
+ "model.layers.8.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
419
+ "model.layers.8.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
420
+ "model.layers.8.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
421
+ "model.layers.8.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
422
+ "model.layers.8.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
423
+ "model.layers.8.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
424
+ "model.layers.8.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
425
+ "model.layers.8.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
426
+ "model.layers.8.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
427
+ "model.layers.8.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
428
+ "model.layers.9.input_layernorm.weight": "model-00002-of-00004.safetensors",
429
+ "model.layers.9.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
430
+ "model.layers.9.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
431
+ "model.layers.9.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
432
+ "model.layers.9.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
433
+ "model.layers.9.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
434
+ "model.layers.9.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
435
+ "model.layers.9.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
436
+ "model.layers.9.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
437
+ "model.layers.9.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
438
+ "model.layers.9.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
439
+ "model.layers.9.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
440
+ "model.mtp_layers.0.final_layernorm.weight": "model-00004-of-00004.safetensors",
441
+ "model.mtp_layers.0.hidden_layernorm.weight": "model-00004-of-00004.safetensors",
442
+ "model.mtp_layers.0.input_layernorm.weight": "model-00004-of-00004.safetensors",
443
+ "model.mtp_layers.0.input_proj.weight": "model-00004-of-00004.safetensors",
444
+ "model.mtp_layers.0.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
445
+ "model.mtp_layers.0.mlp.gate_proj.weight": "model-00004-of-00004.safetensors",
446
+ "model.mtp_layers.0.mlp.up_proj.weight": "model-00004-of-00004.safetensors",
447
+ "model.mtp_layers.0.post_attention_layernorm.weight": "model-00004-of-00004.safetensors",
448
+ "model.mtp_layers.0.self_attn.k_proj.bias": "model-00004-of-00004.safetensors",
449
+ "model.mtp_layers.0.self_attn.k_proj.weight": "model-00004-of-00004.safetensors",
450
+ "model.mtp_layers.0.self_attn.o_proj.weight": "model-00004-of-00004.safetensors",
451
+ "model.mtp_layers.0.self_attn.q_proj.bias": "model-00004-of-00004.safetensors",
452
+ "model.mtp_layers.0.self_attn.q_proj.weight": "model-00004-of-00004.safetensors",
453
+ "model.mtp_layers.0.self_attn.v_proj.bias": "model-00004-of-00004.safetensors",
454
+ "model.mtp_layers.0.self_attn.v_proj.weight": "model-00004-of-00004.safetensors",
455
+ "model.mtp_layers.0.token_layernorm.weight": "model-00004-of-00004.safetensors",
456
+ "model.norm.weight": "model-00004-of-00004.safetensors"
457
+ }
458
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|im_start|>",
4
+ "<|im_end|>",
5
+ "<|object_ref_start|>",
6
+ "<|object_ref_end|>",
7
+ "<|box_start|>",
8
+ "<|box_end|>",
9
+ "<|quad_start|>",
10
+ "<|quad_end|>",
11
+ "<|vision_start|>",
12
+ "<|vision_end|>",
13
+ "<|vision_pad|>",
14
+ "<|image_pad|>",
15
+ "<|video_pad|>"
16
+ ],
17
+ "eos_token": {
18
+ "content": "<|im_end|>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ },
24
+ "pad_token": {
25
+ "content": "<|endoftext|>",
26
+ "lstrip": false,
27
+ "normalized": false,
28
+ "rstrip": false,
29
+ "single_word": false
30
+ }
31
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9c5ae00e602b8860cbd784ba82a8aa14e8feecec692e7076590d014d7b7fdafa
3
+ size 11421896
tokenizer_config.json ADDED
@@ -0,0 +1,208 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "added_tokens_decoder": {
5
+ "151643": {
6
+ "content": "<|endoftext|>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "151644": {
14
+ "content": "<|im_start|>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "151645": {
22
+ "content": "<|im_end|>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ },
29
+ "151646": {
30
+ "content": "<|object_ref_start|>",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": true
36
+ },
37
+ "151647": {
38
+ "content": "<|object_ref_end|>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false,
43
+ "special": true
44
+ },
45
+ "151648": {
46
+ "content": "<|box_start|>",
47
+ "lstrip": false,
48
+ "normalized": false,
49
+ "rstrip": false,
50
+ "single_word": false,
51
+ "special": true
52
+ },
53
+ "151649": {
54
+ "content": "<|box_end|>",
55
+ "lstrip": false,
56
+ "normalized": false,
57
+ "rstrip": false,
58
+ "single_word": false,
59
+ "special": true
60
+ },
61
+ "151650": {
62
+ "content": "<|quad_start|>",
63
+ "lstrip": false,
64
+ "normalized": false,
65
+ "rstrip": false,
66
+ "single_word": false,
67
+ "special": true
68
+ },
69
+ "151651": {
70
+ "content": "<|quad_end|>",
71
+ "lstrip": false,
72
+ "normalized": false,
73
+ "rstrip": false,
74
+ "single_word": false,
75
+ "special": true
76
+ },
77
+ "151652": {
78
+ "content": "<|vision_start|>",
79
+ "lstrip": false,
80
+ "normalized": false,
81
+ "rstrip": false,
82
+ "single_word": false,
83
+ "special": true
84
+ },
85
+ "151653": {
86
+ "content": "<|vision_end|>",
87
+ "lstrip": false,
88
+ "normalized": false,
89
+ "rstrip": false,
90
+ "single_word": false,
91
+ "special": true
92
+ },
93
+ "151654": {
94
+ "content": "<|vision_pad|>",
95
+ "lstrip": false,
96
+ "normalized": false,
97
+ "rstrip": false,
98
+ "single_word": false,
99
+ "special": true
100
+ },
101
+ "151655": {
102
+ "content": "<|image_pad|>",
103
+ "lstrip": false,
104
+ "normalized": false,
105
+ "rstrip": false,
106
+ "single_word": false,
107
+ "special": true
108
+ },
109
+ "151656": {
110
+ "content": "<|video_pad|>",
111
+ "lstrip": false,
112
+ "normalized": false,
113
+ "rstrip": false,
114
+ "single_word": false,
115
+ "special": true
116
+ },
117
+ "151657": {
118
+ "content": "<tool_call>",
119
+ "lstrip": false,
120
+ "normalized": false,
121
+ "rstrip": false,
122
+ "single_word": false,
123
+ "special": false
124
+ },
125
+ "151658": {
126
+ "content": "</tool_call>",
127
+ "lstrip": false,
128
+ "normalized": false,
129
+ "rstrip": false,
130
+ "single_word": false,
131
+ "special": false
132
+ },
133
+ "151659": {
134
+ "content": "<|fim_prefix|>",
135
+ "lstrip": false,
136
+ "normalized": false,
137
+ "rstrip": false,
138
+ "single_word": false,
139
+ "special": false
140
+ },
141
+ "151660": {
142
+ "content": "<|fim_middle|>",
143
+ "lstrip": false,
144
+ "normalized": false,
145
+ "rstrip": false,
146
+ "single_word": false,
147
+ "special": false
148
+ },
149
+ "151661": {
150
+ "content": "<|fim_suffix|>",
151
+ "lstrip": false,
152
+ "normalized": false,
153
+ "rstrip": false,
154
+ "single_word": false,
155
+ "special": false
156
+ },
157
+ "151662": {
158
+ "content": "<|fim_pad|>",
159
+ "lstrip": false,
160
+ "normalized": false,
161
+ "rstrip": false,
162
+ "single_word": false,
163
+ "special": false
164
+ },
165
+ "151663": {
166
+ "content": "<|repo_name|>",
167
+ "lstrip": false,
168
+ "normalized": false,
169
+ "rstrip": false,
170
+ "single_word": false,
171
+ "special": false
172
+ },
173
+ "151664": {
174
+ "content": "<|file_sep|>",
175
+ "lstrip": false,
176
+ "normalized": false,
177
+ "rstrip": false,
178
+ "single_word": false,
179
+ "special": false
180
+ }
181
+ },
182
+ "additional_special_tokens": [
183
+ "<|im_start|>",
184
+ "<|im_end|>",
185
+ "<|object_ref_start|>",
186
+ "<|object_ref_end|>",
187
+ "<|box_start|>",
188
+ "<|box_end|>",
189
+ "<|quad_start|>",
190
+ "<|quad_end|>",
191
+ "<|vision_start|>",
192
+ "<|vision_end|>",
193
+ "<|vision_pad|>",
194
+ "<|image_pad|>",
195
+ "<|video_pad|>"
196
+ ],
197
+ "bos_token": null,
198
+ "chat_template": "{%- if tools %}\n {{- '<|im_start|>system\\n' }}\n {%- if messages[0]['role'] == 'system' %}\n {{- messages[0]['content'] }}\n {%- else %}\n {{- 'You are MiMo, an AI assistant developed by Xiaomi.' }}\n {%- endif %}\n {{- \"\\n\\n# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n {%- for tool in tools %}\n {{- \"\\n\" }}\n {{- tool | tojson }}\n {%- endfor %}\n {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n {%- if messages[0]['role'] == 'system' %}\n {{- '<|im_start|>system\\n' + messages[0]['content'] + '<|im_end|>\\n' }}\n {%- else %}\n {{- '<|im_start|>system\\nYou are MiMo, an AI assistant developed by Xiaomi.<|im_end|>\\n' }}\n {%- endif %}\n{%- endif %}\n{%- for message in messages %}\n {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) or (message.role == \"assistant\" and not message.tool_calls) %}\n {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n' }}\n {%- elif message.role == \"assistant\" %}\n {{- '<|im_start|>' + message.role }}\n {%- if message.content %}\n {{- '\\n' + message.content }}\n {%- endif %}\n {%- for tool_call in message.tool_calls %}\n {%- if tool_call.function is defined %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {{- '\\n<tool_call>\\n{\"name\": \"' }}\n {{- tool_call.name }}\n {{- '\", \"arguments\": ' }}\n {{- tool_call.arguments | tojson }}\n {{- '}\\n</tool_call>' }}\n {%- endfor %}\n {{- '<|im_end|>\\n' }}\n {%- elif message.role == \"tool\" %}\n {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != \"tool\") %}\n {{- '<|im_start|>user' }}\n {%- endif %}\n {{- '\\n<tool_response>\\n' }}\n {{- message.content }}\n {{- '\\n</tool_response>' }}\n {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n {{- '<|im_end|>\\n' }}\n {%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\\n' }}\n {%- if enable_thinking is defined and enable_thinking is false %}\n {{- '<think>\\n\\n</think>\\n' }}\n {%- endif %}\n {%- if enable_thinking is defined and enable_thinking is true %}\n {{- '<think>\\n' }}\n {%- endif %}\n{%- endif %}",
199
+ "clean_up_tokenization_spaces": false,
200
+ "eos_token": "<|im_end|>",
201
+ "errors": "replace",
202
+ "extra_special_tokens": {},
203
+ "model_max_length": 131072,
204
+ "pad_token": "<|endoftext|>",
205
+ "split_special_tokens": false,
206
+ "tokenizer_class": "Qwen2Tokenizer",
207
+ "unk_token": null
208
+ }
vocab.json ADDED
The diff for this file is too large to render. See raw diff